Senior Distributed Systems Engineer

Luma · Peninsula

Company

Luma

Location

Peninsula

Type

Full Time

Job Description

We are looking for people with strong ML & Distributed systems backgrounds. This role will work within our Research team, closely collaborating with researchers to build the platforms for training our next generation of foundation models.

Responsibilities

  • Work with researchers to scale up the systems required for our next generation of models trained on multi-thousand GPU clusters.
  • Profile and optimize our model training code-base to achieve best in class hardware efficiency.
  • Build systems to distribute work across massive GPU clusters efficiently.
  • Design and implement methods to robustly train models in the presence of hardware failures.
  • Build tooling to help us better understand problems in our largest training jobs.

Experience

  • 5+ years of work experience.
  • Experience working with multi-modal ML pipelines, high performance computing and/or low level systems.
  • Passion for diving deep into systems implementations and understanding their fundamentals in order to improve their performance and maintainability.
  • Experience building stable and highly efficient distributed systems.
  • Strong generalist Python and Software skills including significant experience with Pytorch.
  • Good to have experience working with high performance C++ or CUDA.
  • Please note this role is not meant for recent grads.

Your application is reviewed by real people.

Apply Now

Date Posted

09/10/2024

Views

2

Back to Job Listings Add To Job List Company Profile View Company Reviews
Positive
Subjectivity Score: 0.8

Similar Jobs

Senior Staff Simulation Engineer - Wisk

Views in the last 30 days - 0

Wisk Aero is seeking a Senior Staff Simulation Engineer to join their Flight Physics Vehicle Modeling FPVM team The role involves designing implementi...

View Details

Senior Simulation Software Integration Engineer - Wisk

Views in the last 30 days - 0

Wisk is seeking a Senior Simulation Software Integration Engineer to lead the integration of highfidelity simulation models develop advanced test fram...

View Details

Support Engineer - Pricefx

Views in the last 30 days - 0

Pricefx a leading SaaS Pricing Price Optimization Management provider is seeking a Tier 34 Support Engineer The role involves providing technical sup...

View Details

Avionics Mechanical Engineer (Harness) - Reliable Robotics Corporation

Views in the last 30 days - 0

Reliable Robotics is seeking an Avionics Mechanical Engineer to join their Vehicle Design and Integration team in Mountain View California The role in...

View Details

Sr. Flight Software Engineer (Verification) - Reliable Robotics Corporation

Views in the last 30 days - 0

Reliable Robotics is a team of missiondriven engineers developing safetyenhancing technology for aviation aiming to make air transportation safer more...

View Details

Senior Product Manager - Instrumental

Views in the last 30 days - 0

Instrumental is seeking a Senior Product Manager with extensive experience in enterprise SaaS products or deep domain expertise in electronics manufac...

View Details