Cost-Efficient Inference Serving and Routing Optimization- MSc and PHD-Summer internship 2026- Research Lab at IBM - Multiple Cities | Job Transparency - Job Transparency | Find Your Dream Job with Full Salary Transparency

Cost-Efficient Inference Serving and Routing Optimization- MSc and PHD-Summer internship 2026- Research Lab

IBM • Multiple Cities

Company

IBM

Location

Multiple Cities

Type

Full Time

Job Description

Introduction

At IBM work is more than a job - it’s a calling: To build. To design. To code. To consult. To think along with clients and sell. To make markets. To invent. To collaborate. Not just to do something better but to attempt things you’ve never thought possible. Are you ready to lead in this new era of technology and solve some of the world’s most challenging problems? If so let’s talk.

Your role and responsibilities

We are looking for a highly motivated PhD or MSc student to join our team for a summer internship focused on cost-efficient serving of large-scale AI inference workloads.
The internship will explore advanced routing strategies and KV-cache–aware optimizations in distributed inference systems with an emphasis on improving performance scalability and GPU cost efficiency.
What you will work on

Designing and evaluating routing algorithms to optimize inference latency throughput and cost
Investigating KV cache management strategies for large-scale distributed inference serving
Prototyping benchmarking and analyzing inference optimization techniques
Working with modern inference frameworks and real production-like workloads

Why join us?
This internship offers a unique opportunity to work at the intersection of AI systems and distributed infrastructure with real-world impact on scalable cost-efficient inference serving used in production environments.

Required education

Bachelor's Degree

Required technical and professional expertise

MSc or PhD student in Computer Science Machine Learning Systems or a related field
Strong background or interest in distributed systems systems research or ML infrastructure
Strong programming skills (Python Go or similar)
Hands-on experience or familiarity with vLLM (architecture KV cache behavior scheduling or extensions)
Interest in AI infrastructure performance optimization and cost efficiency
Ability to work independently while collaborating effectively within a research and engineering team

Please include your grade sheet with your application.

Preferred technical and professional experience

Experience with Kubernetes (K8s) and cloud-native systems
Familiarity with inference serving stacks networking or GPU-based systems
Experience with benchmarking profiling or performance analysis

Similar Jobs

Entry Level Software Developer: 2026 - IBM

Views in the last 30 days - 0

This job description outlines a Software Developer role involving realworld projects collaboration with designers and developers and use of modern web...

View Details

Manager, Technology Sales Leaders - HCLS - IBM

Views in the last 30 days - 0

This role involves leading technology sales teams driving innovation and fostering collaboration to solve complex business challenges while emphasizin...

View Details

Application Developer-Open Source - IBM

Views in the last 30 days - 0

The text promotes IBM Consultings career opportunities emphasizing collaboration on hybrid cloud and AI projects innovation and professional growth It...

View Details

[IBM Japan:iX] Mobile Developer - IBM

Views in the last 30 days - 0

This job posting highlights opportunities to develop innovative mobile solutions using advanced technologies collaborate with IBMs global resources an...

View Details

Senior Solutions Architect - IBM

Views in the last 30 days - 0

This text promotes IBM Consultings career opportunities emphasizing longterm client relationships innovative solutions and professional growth It outl...

View Details

Gerente de RH - IBM

Views in the last 30 days - 0

This role involves leading HR strategies fostering innovation and managing teams to drive impactful growth within IBMs collaborative environment The p...

View Details

Cost-Efficient Inference Serving and Routing Optimization- MSc and PHD-Summer internship 2026- Research Lab

Company

Location

Type

Job Description

Date Posted

Views

Similar Jobs

Entry Level Software Developer: 2026 - IBM

Manager, Technology Sales Leaders - HCLS - IBM

Application Developer-Open Source - IBM

[IBM Japan:iX] Mobile Developer - IBM

Senior Solutions Architect - IBM

Gerente de RH - IBM