Senior Machine Learning Engineer (Inference Platform)
Job Description
Team: IT
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Machine Learning Engineer (Inference Platform) in the United States.
In this role, you will take ownership of the production inference systems that power a high-scale AI-driven conversational shopping experience. You will be responsible for building, operating, and optimizing the end-to-end ML serving infrastructure, ensuring models run reliably under real-world production load. The position sits at the intersection of machine learning, distributed systems, and platform engineering, with a strong focus on performance, scalability, and cost efficiency. You will collaborate closely with ML engineers, data teams, product, and DevOps to bring models from experimentation into production seamlessly. This is a high-impact role where your architectural decisions directly shape user experience and system performance. You will work in a fast-paced, startup-like environment where ownership and technical depth are essential. The role offers significant autonomy in defining the future of the inference platform.
Accountabilities:
You will be responsible for building and scaling the core infrastructure that serves machine learning models in production, ensuring reliability, efficiency, and observability across all inference workflows.
- Own and evolve a multi-engine inference platform supporting LLMs, embedding models, and other ML workloads in production environments
- Build and maintain production-grade ML serving pipelines, from model packaging and deployment to monitoring and lifecycle management
- Define and enforce SLAs for latency, throughput, availability, GPU utilization, and token-level performance metrics such as TTFT and ITL
- Design and implement model versioning, rollout, rollback, and reproducibility strategies for safe and scalable deployments
- Develop observability, monitoring, alerting, and debugging tools for production inference systems
- Optimize inference performance through batching strategies, GPU utilization, quantization, and hardware-aware system design
- Ensure secure, scalable, and cost-efficient ML serving infrastructure across cloud environments
- Partner cross-functionally with ML, data, product, and DevOps teams to translate research into production-ready systems
- 5–8+ years of experience in ML engineering, software engineering, or platform/infrastructure roles with ownership of production ML systems
- Hands-on experience operating LLM serving frameworks such as vLLM, TGI, TensorRT-LLM, or SGLang in real production environments
- Strong Python skills and solid understanding of distributed systems and backend engineering principles
- Experience with cloud platforms (AWS, GCP, or Azure) and ML lifecycle tooling, including model registries and deployment systems
- Deep understanding of inference optimization concepts such as KV caching, batching strategies, GPU memory behavior, and latency bottlenecks
- Experience supporting heterogeneous ML workloads including LLMs, embeddings, and extraction models
- Strong ability to balance latency, throughput, reliability, and infrastructure cost trade-offs
- Experience working in fast-paced, high-growth environments with evolving technical requirements
- Excellent problem-solving, communication, and collaboration skills across technical and non-technical teams
- Competitive compensation aligned with experience and impact
- Remote-first flexibility within the United States
- Opportunity to shape core AI infrastructure powering a large-scale consumer-facing product
- High ownership role with influence over architecture and technical direction
- Collaborative, cross-functional engineering environment
- Exposure to cutting-edge LLM and AI inference technologies
- Fast-paced startup culture with strong autonomy and technical depth
Requirements:
The ideal candidate brings deep experience in production ML systems, strong software engineering fundamentals, and hands-on expertise with large-scale inference infrastructure.
Benefits:
Explore More
Date Posted
06/05/2026
Views
0
Similar Jobs
Senior Software Engineer, Developer Experience - Jobgether
Views in the last 30 days - 0
View Details