Senior Systems Software Engineer, Containers and Kubernetes

NVIDIA · South Bay

Company

NVIDIA

Location

South Bay

Type

Full Time

Job Description

NVIDIA is looking for outstanding software and systems engineers to help us develop and operate our enterprise GPU infrastructure management systems across Clouds. In this role, you will work closely with the broader NVIDIA team to operate, design and build infrastructure management systems, Kubernetes operators, and end-to-end HPC integration solutions that combine GPUs with the rest of the datacenter software management ecosystem. We are focused on supporting NVIDIA products across HPC, Cloud, and enterprise on both bare metal and virtualized platforms as the role of GPUs in all of these environments expands. Your contributions will span many aspects of GPU systems management, including Cloud provisioning, observability, operations and incident response. The systems you operate will support single-node developer systems through large clusters with thousands of nodes deployed on multiple Cloud providers.

To succeed, you must have a strong system and software development background, familiarity with modern distributed systems especially the Cloud-native ecosystem, and a proven work ethic. This is a dynamic work environment with many exciting opportunities awaiting. NVIDIA GPUs are central to many hot enterprise, cloud, and datacenter trends, come join us as we craft the future of accelerated computing and AI.

What you'll be doing:

  • Enable GPU provisioning and life-cycle with state-of-the-art Cloud-Native open-source ecosystem solutions, including Kubernetes, Docker, Prometheus, TerraForm and Crossplane.

  • Develop, maintain and/or operate robust, scalable Go programs in a Kubernetes environment.

  • Develop the next-generation multi-cloud infrastructure management systems to support GenAI.

  • Support internal and external users through bug fixes, documentation, and feature improvements.

  • Maintain high-quality products through robust test coverage and Day 2 capabilities.

What we need to see:

  • BS or higher in Computer Science or equivalent experience.

  • 6+ years of meaningful industry experience with a strong Kubernetes and SRE background

  • Deep understanding and execution skills of all aspects of the software development lifecycle

  • Experience with OpenAPI and Kubernetes Custom Resource Definitions

  • Business level English, outstanding written and verbal interpersonal skills

  • Strong motivation and commitment to learn new skills

  • Ability to manage time in a fast, heavily multitasked environment

Ways to stand out from the crowd:

  • Open-Source contributions to the Cloud-Native community and an understanding of AI and LLM principles

  • Strong experience with GitHub/GitLab CI/CD pipelines and application configuration.

  • Strong knowledge of container technologies, orchestration frameworks and observability systems.

  • Exposure to GPU programming with CUDA and familiarity with Kubernetes internals. Experience in developing Kubernetes operators.

  • Experience with managing and operating HPC schedulers and/or working across multiple Cloud providers.

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. If you're creative and autonomous, we want to hear from you!

The base salary range is 180,000 USD - 339,250 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Apply Now

Date Posted

10/03/2024

Views

0

Back to Job Listings Add To Job List Company Profile View Company Reviews
Positive
Subjectivity Score: 0.8

Similar Jobs

Senior Front-End Software Engineer - Percipient.ai

Views in the last 30 days - 0

Percipientai founded in 2017 is a cuttingedge technology company specializing in Computer Vision Artificial Intelligence and Deep Learning They develo...

View Details

Senior Developer, Data Engineer - Tarana Wireless, Inc.

Views in the last 30 days - 0

Tarana is seeking a Senior DeveloperData Engineer with 5 years of experience in building largescale data pipelines The role involves designing buildin...

View Details

Principal Software Engineer (Prisma Access) - Palo Alto Networks

Views in the last 30 days - 0

Palo Alto Networks is a cybersecurity company committed to protecting the digital way of life They are seeking a Principal Software Engineer to build ...

View Details

Principal Engineer Software (Full Stack Developer) - Palo Alto Networks

Views in the last 30 days - 0

Palo Alto Networks is seeking a Senior FullStack Engineer to develop and maintain highperformance web applications collaborating with crossfunctional ...

View Details

Senior Program Manager, Global Occupational Health & Safety - ServiceNow

Views in the last 30 days - 0

ServiceNow is seeking a Health Safety Program Manager to design implement and lead a comprehensive corporate safety program The role involves develop...

View Details

Staff Flight Test Engineer - Wisk

Views in the last 30 days - 0

Wisk Aero is seeking a Staff Flight Test Engineer to join their team in Hollister CA The role involves ensuring safe and efficient flight testing and ...

View Details