Director, Continuous Bringup and Optimization - NVIS

NVIDIA · Remote

Company

NVIDIA

Location

Remote

Type

Full Time

Job Description

NVIDIA seeks a data center infrastructure optimization and resiliency team manager to join its infrastructure specialist team. Academic and commercial groups worldwide use NVIDIA products to redefine deep learning, data analytics, and power data centers. Join the team building many of the world's largest and fastest data centers! NVIDIA is looking for someone who can lead a customer team responsible for production AI infrastructure and workflow optimization, working on a complex customer-focused operation optimization and related problem-solving, Planning, facilitating, and executing continuous improvement events using NVIDIA telemetry tools, and interfacing with company stakeholder management that requires excellent interpersonal skills. This role will involve interacting with customers, partners, and internal teams to analyze, define, and implement large-scale data center infrastructure optimization. These efforts include a combination of leading practical experience in handling data center team systems, networks, cloud operation and orchestration, AI workload resiliency, and performance optimization with an assurance of continual and efficient Planning, operation, validation services, and team performance.

Want more jobs like this?

Get jobs that are Remote delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.


What you will be doing:

  • Manage regional, customer-dedicated teams focused on optimizing customer infrastructure and enhancing resiliency.
  • Lead a team that inspects and observes infrastructure and AI workloads to ensure system health and performance.
  • Establish and refine optimization workflows, collaborate with customers and analytics partners, and analyze results to improve AI workload production processes.
  • Work closely with customers and NVIDIA teams to prioritize, frame, and implement system improvements related to customer health and operational process evolution.
  • Partner with development, tools, and support teams to optimize GPU and infrastructure utilization, ensuring efficient capacity consumption.
  • Offer technical guidance and oversight for systems and networking activities. Served as the primary manager across all initiatives, allocated team schedules, prioritized tasks, and provided feedback and direction on complex technical issues.
  • Work closely with the customer IT infrastructure teams to design and implement data center network changes, accommodating new and changing requirements.
  • Ensure deployment risks are minimized across regional activities to maintain operational integrity.
  • Establish, supervise, and continuously improve processes to prevent infrastructure, systems, and services failures.
  • Guarantee that all tasks are completed with high quality, avoiding negative impacts on internal/external users and business operations.
  • Plan and implemented ongoing improvements to deployment methodologies for greater effectiveness and efficiency.
  • Ensure all operational KPIs and metrics are tracked and met. Maintain a strong commitment to service quality and user experience, striving for continuous improvement.
  • Keep informed of developments in NVIDIA products, particularly in data center facilities, systems, and networking, and provide recommendations to address current and future needs.
  • Ensure that documentation, policies, procedures, and guidelines are in place for systems, resources, and activities.
  • Supervise, evaluate, mentor, and coach team members, fostering a culture of continuous learning and professional growth.

What we need to see:

  • 10+ overall years of demonstrable and confirmed service operational management experience in enterprise-level data centers with continual infrastructure and service improvement.
  • 7+ years' experience of people management.
  • Data Center, Servers, and Networks related certification - preferred
  • Bachelor's degree or equivalent experience.
  • In-depth Practical knowledge and experience of data center environments, servers, network equipment, operations and services
  • Extensive experience in installing, monitoring, and maintaining data center equipment.
  • Analytical Attitude & Problem Solving - able to analyze information, problems, situations, practices, and/or procedures, collect and interpret data, reason logically, establish facts, identify and define existing and potential issues, recognize the interrelationships among elements, draw valid conclusions, develop recommendations, as well as alternative courses of action, select appropriate course, follow up, and evaluate
  • Exceptional ability to work as part of a team, provide IT support, and resolve errors.
  • Organization & Time Management - able to plan, schedule, and organize tasks related to the job to achieve goals within or ahead of established time frames.
  • Willingness to travel (25%).

Way to stand out from the crowd:

  • Experience in data center operations process, safety, and security measures.
  • Knowledge of data center Infrastructure
  • Outstanding social skills.

Apply Now

Date Posted

12/04/2024

Views

0

Back to Job Listings Add To Job List Company Profile View Company Reviews
Positive
Subjectivity Score: 0.9

Similar Jobs

Executive Director Patient Advocacy - Kyverna Therapeutics

Views in the last 30 days - 0

Kyverna Therapeutics is seeking an Executive Director for Patient Advocacy to lead initiatives in autoimmune disease treatment The role involves build...

View Details

Senior Design Manager (Infrastructure) - Canonical

Views in the last 30 days - 0

Canonical a leading opensource provider seeks a Senior Design Manager to drive innovation in cloud and AI technologies The role offers remote work glo...

View Details

Senior Product Designer - Org & Security - Typeform

Views in the last 30 days - 0

This job description outlines a role in developing an intelligent contact management system with AI capabilities The position involves designing user ...

View Details

Medical Affairs Writer Contract - Kyverna Therapeutics

Views in the last 30 days - 0

Kyverna Therapeutics seeks a Medical Affairs Writer to develop scientific publications and communications for cell therapy innovations The role requir...

View Details

Product Manager Wallet SDKs - Startale

Views in the last 30 days - 0

The text describes a job alert system where applicants must mention UNSELFISH and use a specific tag to demonstrate they read the post It explains the...

View Details

Recovery Analyst Underpayments - Trend Health Partners

Views in the last 30 days - 0

TREND Health Partners seeks an Underpayment Recovery Analyst to optimize client reimbursement through collaboration and detailed claim analysis The ro...

View Details