IBM Research takes responsibility for technology and its role in society. Working in IBM Research means you'll join a team who invent what's next in computing always choosing the big urgent and mind-bending work that endures and shapes generations. Our passion for discovery and excitement for defining the future of tech is what builds our strong culture around solving problems for clients and seeing the real world impact that you can make.
IBM's product and technology landscape includes Research Software and Infrastructure. Entering this domain positions you at the heart of IBM where growth and innovation thrive.
We are looking for a talented and highly motivated intern to help advance our efforts in building autonomous data management systems. The work will include using foundation models (FMs) and AI agents for tasks such as data discovery knowledge representation data access and retrieval with querying and automated data-driven insights. This is the core of the role: turning FMs and AI agents into dependable partners for real data work across modern data stacks.
Our focus is on research and development for data workflow orchestration — taking natural language all the way to trusted insights across multiple tools and functions. This includes step-by-step planning and reasoning for complex data workflows developing low-computational-cost inference techniques so FMs and AI agents can efficiently automate or assist users on data tasks and advancing trustworthy data agents where gauging factuality faithfulness and transparency of outputs over structured and unstructured data is critical. In this context uncertainty quantification and mitigation along with improved planning and reasoning play a central role providing useful tools to strengthen user confidence manage model error propagation and enable uncertainty-aware post-training. Together these efforts make agentic solutions more efficient accurate and trustworthy.
Skills and tasks of interest include:
- [LLM for code generation] Using foundation models for code generation specific to data tasks such as SQL or NoSQL for data retrieval python code generation for analytical insights.
- [Knowledge Graphs Multi-Modal FMs] Combining foundation models knowledge graphs multi-modal structured and unstructured data to improve data discovery and automated Text-to-SQL.
- [FM Inference] Improving FM inference for both answer quality and computational cost.
- [LLMs for DataOps] Creating generative-AI tooling for DataOps (e.g. data integration and flows) analogous to DevOps accelerators but for data engineering and analytics.
- [Efficient and Reliable AI Agents] Creating efficient AI Agents that can reliably operate as part of an autonomous system.
- Pursuing graduate studies in computer science or related fields
- At least one main author research publication at a top conference in AI such as NeurIPS AAAI VLDB SIGMOD IJCAI ICML ICLR and ICAPS
- Familiarity and working expertise with large language models
- Familiarity with knowledge graphs SQL RAG Agentic frameworks
- Familiarity with reinforcement learning AI planning
- Familiarity with prompt optimization techniques