AI HW Systems Engineering and Debug Lead

· Remote

Location

Remote

Type

Full Time

Job Description

AI HW Systems Engineering and Debug Lead

Reposted 11 Hours Ago
Be an Early Applicant
Austin TX USA
Hybrid
Expert/Leader
Artificial Intelligence • Semiconductor
Joining Graphcore gives you a seat at the top-table shaping the future of Artificial Intelligence.
The Role
Lead system-level debug and root cause analysis for AI data center platforms. Develop debug methodologies and coordinate cross-functional teams to resolve issues.
Summary Generated by Built In

About us 

Graphcore is one of the world’s leading innovators in Artificial Intelligence compute. It is developing hardware software and systems infrastructure that will unlock the next generation of AI breakthroughs and power the widespread adoption of AI solutions across every industry. 

As part of the SoftBank Group Graphcore is a member of an elite family of companies responsible for some of the world’s most transformative technologies. Together they share a bold vision: to enable Artificial Super Intelligence and ensure its benefits are accessible to everyone. 

Graphcore’s teams are drawn from diverse backgrounds and bring a broad range of skills and perspectives. A melting pot of AI research specialists silicon designers software engineers and systems architects Graphcore enjoys a culture of continuous learning and constant innovation. 

Job Summary 

We are seeking an experienced AI HW Systems Engineering and Debug Lead to drive system-level debug and bring-up activities for Graphcore’s next-generation AI data center platforms. 

The successful candidate will lead complex debug efforts across hardware firmware and software layers for blade and rack-level systems. This role focuses on developing scalable debug strategies improving debug throughput and ensuring timely resolution of system-level issues throughout the product lifecycle. 

The Team 

Graphcore is a globally recognised leader in Artificial Intelligence computing systems. The company designs advanced semiconductors and data centre hardware that provide the specialised processing power needed to drive AI innovation while delivering the efficiency required to support its broader adoption 

The Systems Engineering and Validation team ensures Graphcore’s AI compute platforms are fully validated debugged and ready for deployment in hyperscale data center environments. 

The team collaborates closely with silicon engineering system architecture firmware operating system and rack integration teams to identify and resolve system-level issues and drive improvements in validation and debug methodologies. 

Responsibilities and Duties 

  • Own and develop AI systems debug methodology and system bring-up strategies for next-generation AI data center platforms. 
  • Lead system-level debug and root cause analysis for issues identified during server rack validation post-silicon validation and production phases. 
  • Drive complex debug efforts across silicon hardware platforms firmware operating systems and software stacks. 
  • Manage and track technical issues risks and priorities to ensure program milestones are achieved. 
  • Publish debug program indicators and metrics to identify roadblocks and improve debug throughput. 
  • Coordinate cross-functional teams including system architecture silicon firmware and validation teams to resolve system-level issues. 
  • Lead development and integration of debug tools scripts and methodologies to improve debug efficiency. 
  • Communicate program status risks and technical findings to engineering leadership and stakeholders. 

Candidate Profile 

Essential 

  • Bachelor’s or Master’s degree in Electrical Engineering Computer Engineering or related discipline. 
  • 15+ years of experience working on complex systems engineering challenges involving HW/FW/SW debug in server or data center environments. 
  • Proven experience leading validation and debug for board blade and rack-level hardware platforms. 
  • Strong experience debugging OS firmware silicon and hardware issues. 
  • Understanding of industry-standard system buses such as PCIe and CXL and their software stacks. 
  • Strong knowledge of ARM or x86 CPU architectures SoC design memory systems and power management. 
  • Experience with system architecture validation strategies and complex system debug methodologies. 
  • Strong collaboration communication and cross-team coordination skills. 

Desirable 

  • Experience designing or deploying AI/ML rack-scale systems. 
  • Experience developing at-scale debug methodologies for hyperscale data center systems. 
  • Familiarity with data center infrastructure and emerging AI hardware technologies. 
  • Experience with rack integration testing and hyperscale deployment readiness. 
  • Knowledge of automated validation frameworks test analytics and continuous validation practices. 

Top Skills

Ai Systems
Arm
Cxl
Firmware
Hardware
Pcie
Soc Design
Software
X86

What the Team is Saying

Monika
Dionysia
Dave
Am I A Good Fit?
beta
Expert contributor network
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Bristol
488 Employees
Year Founded: 2016

What We Do

At Graphcore we’re building the future of AI compute. We’re a team of semiconductor software and AI experts with deep experience in creating the complete AI compute stack - from silicon and software to infrastructure at datacenter scale. As part of the SoftBank Group backed by significant long-term investment we are delivering key technology into the fast-growing SoftBank AI ecosystem. To meet the vast and exciting AI opportunity Graphcore is expanding its teams around the world. We are bringing together the brightest minds to solve the toughest problems in a place where everyone has the opportunity to make an impact on the company our products and the future of artificial intelligence.

Why Work With Us

Our team is at the forefront of the machine intelligence revolution enabling innovators from all industries to build AI-native products to expand human potential. What we do at Graphcore really makes a difference.

Gallery

Graphcore Offices

Hybrid Workspace

Employees engage in a combination of remote and on-site work.

At Graphcore we value wellbeing and flexibility to support a healthy work/life balance. Our hybrid approach encourages office-based colleagues to work onsite three days a week with trusted flexibility built on trust and transparency for everyone.

Typical time on-site: 3 days a week
HQHeadquarters
Austin Office
Bengaluru Office
Cambridge Office
Gdańsk Office
Hsinchu Office
London Office
Learn more

Similar Jobs

Graphcore

Hardware Validation Manager

Artificial Intelligence • Semiconductor
Hybrid
2 Locations
488 Employees

Graphcore

Design Engineer

Artificial Intelligence • Semiconductor
Hybrid
Austin TX USA
488 Employees

Graphcore

Staff UEFI Engineer

Artificial Intelligence • Semiconductor
Hybrid
2 Locations
488 Employees
100K-150K Annually

Graphcore

Software Engineer

Artificial Intelligence • Semiconductor
Hybrid
Austin TX USA
488 Employees
Apply Now

Date Posted

04/27/2026

Views

0

Back to Job Listings Add To Job List Company Profile View Company Reviews
Neutral
Subjectivity Score: 0

Similar Jobs

142,000+ Jobs Tracked
12,400+ Companies
1,930 Categories