Senior Machine Learning Systems Engineer (Training Optimization)
Company
Canva
Location
China
Type
Full Time
Job Description
Company Description
About the Group/Team We're the CORE team within the Generative AI supergroup. Our mission is to invent foundational technologies that will power the future of AI-assisted design. From large-scale models to groundbreaking research our team builds the technical core of Canva’s creative intelligence engine. We collaborate globally to ship research that makes a real impact—from smart editing to AI video tools—at massive scale.
Job Description
About the Role/Specialty As a Senior Machine Learning Systems Engineer you’ll lead efforts to scale and optimize the training system for our large-scale multimodal and foundation models. You’ll design distributed training systems using Megatron-LM NVIDIA NeMo FSDP and Triton—pushing the limits of performance across compute memory and communication layers. You'll sit at the intersection of systems and AI research directly shaping how we train the models that will power Canva’s next generation of products.
What you’ll do (responsibilities)
-
You’ll design implement and optimize large-scale machine learning systems for training and inference.
-
You’ll improve all aspects of performance including GPU utilization communication overhead and memory efficiency.
-
You’ll partner with research and modeling teams to align systems with algorithmic needs.
-
You’ll evaluate and apply best practices for distributed training using industry-leading frameworks.
-
You’ll dive deep into low-level optimization including custom CUDA or Triton kernels.
-
You’ll debug profile and fine-tune training workflows to unlock new levels of scalability.
Qualifications
What we're looking for
We’re looking for a systems-first engineer who thrives in fast-paced high-impact environments. You’re deeply familiar with distributed model training at scale and understand the nuances of optimizing compute at every level of the stack. You're excited by challenges that stretch current boundaries and you’re a strong collaborator who communicates clearly across domains.
-
Strong background in LLMs multimodal AI or diffusion models.
-
Proficiency in Python. Familiarity with a system programming language (e.g. C++ or Rust) is a plus.
-
Deep knowledge of PyTorch or JAX as well as libraries such as Megatron-LM NeMo or DeepSpeed.
-
Familiarity with common optimization techniques such as FSDP/ZeRO gradient checkpointing or low-precision data types.
-
Hands-on experience writing custom GPU kernels in CUDA or Triton.
-
Excellent communication and problem-solving skills incl. full proficiency in English.
Date Posted
11/15/2025
Views
0
Similar Jobs
Senior Software Engineer - Community Support Engineering(Multiple roles) - Airbnb
Views in the last 30 days - 0
Airbnb is hiring 2 Senior Software Engineers in China to join the Community Support Engineering team The role involves developing and evolving CS engi...
View DetailsSenior Frontend Engineer - Whatnot
Views in the last 30 days - 0
Whatnot is a rapidly growing livestream shopping platform in North America and Europe offering a wide range of products from fashion to collectibles T...
View DetailsSenior Software Engineer - Community Support Engineering - Airbnb
Views in the last 30 days - 0
Airbnb is seeking a Senior Software Engineer for Community Support Engineering in China The role involves leading project execution to improve the cus...
View DetailsSenior Staff Engineer - Community Support Engineering - Airbnb
Views in the last 30 days - 0
The text is a job description for a Senior Staff Engineer position in Community Support Engineering at Airbnb in China The role involves driving techn...
View DetailsSenior Operations Manager - Trafilea
Views in the last 30 days - 0
Trafilea promotes innovation and body positivity seeking a Senior Operations Manager in China with experience in apparel production and crossfunctiona...
View DetailsSenior Data Scientist - Inference, Global Markets - Airbnb
Views in the last 30 days - 0
The Global Markets team at Airbnb is focused on evolving guest and host experiences to accelerate international growth The role involves partnering wi...
View Details