Senior Machine Learning Engineer (Inference Platform)

Jobgether · US

Company

Jobgether

Location

US

Type

Full Time

Job Description

Team: IT

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Machine Learning Engineer (Inference Platform) in the United States.

In this role, you will take ownership of the production inference systems that power a high-scale AI-driven conversational shopping experience. You will be responsible for building, operating, and optimizing the end-to-end ML serving infrastructure, ensuring models run reliably under real-world production load. The position sits at the intersection of machine learning, distributed systems, and platform engineering, with a strong focus on performance, scalability, and cost efficiency. You will collaborate closely with ML engineers, data teams, product, and DevOps to bring models from experimentation into production seamlessly. This is a high-impact role where your architectural decisions directly shape user experience and system performance. You will work in a fast-paced, startup-like environment where ownership and technical depth are essential. The role offers significant autonomy in defining the future of the inference platform.

Accountabilities:

You will be responsible for building and scaling the core infrastructure that serves machine learning models in production, ensuring reliability, efficiency, and observability across all inference workflows.

  • Own and evolve a multi-engine inference platform supporting LLMs, embedding models, and other ML workloads in production environments
  • Build and maintain production-grade ML serving pipelines, from model packaging and deployment to monitoring and lifecycle management
  • Define and enforce SLAs for latency, throughput, availability, GPU utilization, and token-level performance metrics such as TTFT and ITL
  • Design and implement model versioning, rollout, rollback, and reproducibility strategies for safe and scalable deployments
  • Develop observability, monitoring, alerting, and debugging tools for production inference systems
  • Optimize inference performance through batching strategies, GPU utilization, quantization, and hardware-aware system design
  • Ensure secure, scalable, and cost-efficient ML serving infrastructure across cloud environments
  • Partner cross-functionally with ML, data, product, and DevOps teams to translate research into production-ready systems
  • Requirements:

    The ideal candidate brings deep experience in production ML systems, strong software engineering fundamentals, and hands-on expertise with large-scale inference infrastructure.

    • 5–8+ years of experience in ML engineering, software engineering, or platform/infrastructure roles with ownership of production ML systems
    • Hands-on experience operating LLM serving frameworks such as vLLM, TGI, TensorRT-LLM, or SGLang in real production environments
    • Strong Python skills and solid understanding of distributed systems and backend engineering principles
    • Experience with cloud platforms (AWS, GCP, or Azure) and ML lifecycle tooling, including model registries and deployment systems
    • Deep understanding of inference optimization concepts such as KV caching, batching strategies, GPU memory behavior, and latency bottlenecks
    • Experience supporting heterogeneous ML workloads including LLMs, embeddings, and extraction models
    • Strong ability to balance latency, throughput, reliability, and infrastructure cost trade-offs
    • Experience working in fast-paced, high-growth environments with evolving technical requirements
    • Excellent problem-solving, communication, and collaboration skills across technical and non-technical teams
    • Benefits:

      • Competitive compensation aligned with experience and impact
      • Remote-first flexibility within the United States
      • Opportunity to shape core AI infrastructure powering a large-scale consumer-facing product
      • High ownership role with influence over architecture and technical direction
      • Collaborative, cross-functional engineering environment
      • Exposure to cutting-edge LLM and AI inference technologies
      • Fast-paced startup culture with strong autonomy and technical depth
Apply Now

Date Posted

06/05/2026

Views

0

Back to Job Listings Add To Job List Company Profile View Company Reviews
Neutral
Subjectivity Score: 0
142,000+ Jobs Tracked
12,400+ Companies
1,930 Categories