Infrastructure Engineer (GPU & Compute)

Jobgether · US

Company

Jobgether

Location

US

Type

Full Time

Job Description

Team: IT

This position is posted by Jobgether on behalf of a partner company. We are currently looking for an Infrastructure Engineer (GPU & Compute) in the United States.

This role is at the core of building and scaling high-performance infrastructure designed for modern AI and machine learning workloads. You will work across hardware, systems, and software layers to ensure GPU-enabled environments are reliable, efficient, and production-ready from day one. The position combines deep technical expertise with hands-on ownership of image pipelines, system validation, and large-scale compute environments. You will play a critical role in enabling seamless deployment and operation of cutting-edge AI infrastructure by improving automation, diagnostics, and performance. Collaborating with cross-functional teams, you will help bring new systems online, validate next-generation hardware, and enhance operational efficiency. This is a high-impact opportunity within a fast-paced, innovation-driven environment focused on scaling compute for the future of AI.

Accountabilities:

  • Own and evolve systems for image management, deployment, and validation across large-scale bare-metal and GPU-enabled infrastructure environments.
  • Maintain and operate validation clusters used for system diagnostics, testing, and infrastructure bring-up to ensure readiness and reliability.
  • Lead GPU diagnostics and validation workflows, identifying performance bottlenecks, failure patterns, and system-level issues across hardware and software layers.
  • Build and enhance automation tools and workflows (primarily in Python) to streamline provisioning, validation, and operational processes.
  • Support hardware qualification efforts for new platforms, including firmware, drivers, and operating system validation.
  • Manage Linux-based production and validation environments, including virtualization and bare-metal provisioning systems (e.g., PXE workflows).
  • Collaborate with infrastructure, hardware, data center, and ML teams to align systems with workload requirements and ensure optimal performance.
  • Contribute to best practices for infrastructure lifecycle management, system diagnostics, and scalability improvements.
  • Requirements:

    • 5+ years of experience in infrastructure engineering, systems engineering, or related technical roles.
    • Strong expertise in Linux systems administration within production or large-scale environments.
    • Hands-on experience with GPU-enabled systems and performance/monitoring tools such as NVIDIA DCGM.
    • Solid understanding of bare-metal provisioning, system bring-up processes, and image-based deployment workflows.
    • Proficiency in Python or similar programming/scripting languages for building automation tools.
    • Demonstrated ability to troubleshoot complex issues across hardware, operating systems, GPUs, and system software layers.
    • Familiarity with hardware management interfaces such as IPMI, iDRAC, or Redfish.
    • Experience working with data center infrastructure and physical hardware environments is highly valued.
    • Bonus: Experience with high-performance interconnects (InfiniBand, NVLink), AI/ML or HPC workloads, and large-scale hardware validation frameworks.
    • Benefits:

      • Competitive base salary ranging from $180,000 to $200,000 USD, based on experience and location.
      • Performance-based bonus and meaningful equity participation.
      • Comprehensive medical, dental, and vision coverage.
      • Retirement and financial wellness programs.
      • Generous paid time off, holidays, and paid parental leave.
      • Flexible remote or hybrid work options within the United States.
      • Professional development support and learning opportunities.
      • Wellness and home office stipends.
      • Inclusive and collaborative work environment focused on innovation and balance.
Apply Now

Date Posted

05/05/2026

Views

0

Back to Job Listings Add To Job List Company Profile View Company Reviews
Neutral
Subjectivity Score: 0
142,000+ Jobs Tracked
12,400+ Companies
1,930 Categories