Chief HPC Engineer

EPAM Systems · Barra do Garças, Brazil

Company

EPAM Systems

Location

Barra do Garças, Brazil

Type

Full Time

Job Description

We are seeking a Chief HPC Engineer with robust technical skills in HPC infrastructure to manage day-to-day operations and engineering activities within our HPC environment. The ideal candidate will have a strong engineering background with significant hands-on experience in deployment and optimization. This leadership role requires strategic oversight and a proactive approach to maintaining and enhancing system performance and reliability.

#LI-DNI

Responsibilities

  • Support and oversee HPC infrastructure
  • Implement Infrastructure as Code (IaC) for system automation
  • Lead incident resolution efforts, as well as software and hardware upgrades
  • Guide and mentor a team of HPC engineers
  • Strategize and implement system scalability and efficiency improvements
  • Ensure system security and compliance with industry standards
  • Develop and monitor key performance indicators to assess system health
  • Foster strong vendor relationships for system hardware and software procurement
  • Lead research and adoption of new technologies to keep the infrastructure at the cutting edge
  • Facilitate collaboration across departments to align HPC strategies with organizational goals
Requirements

Want more jobs like this?

Get jobs in Barra do GarΓ§as, Brazil delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.
  • Minimum of 7 years of experience as an HPC Engineer
  • At least 2 years of relevant leadership experience
  • Proficiency in Linux (any rpm-based), including compiling kernel modules and using debugging tools like strace, coredump, and tcpdump
  • Experienced in managing HPC job schedulers such as IBM LSF and Slurm
  • Skilled in configuring and implementing Bright Cluster Manager
  • Understanding of both GPFS and Lustre file systems
  • Familiarity with InfiniBand and OmniPath network interconnect technologies
  • Fluent English communication skills at a C1 level or higher
Nice to have
  • Proficiency in hardware diagnostics, upgrades, and tuning, including HCA InfiniBand and disk arrays from Lustre, Vast, IBM
  • Capability to utilize infrastructure monitoring tools like Zabbix, Splunk, or Grafana
  • Understanding of Easybuild
  • Experience working within a GxP environment
  • Familiarity with project and service management tools like Jira and ServiceNow

Apply Now

Date Posted

11/19/2024

Views

0

Back to Job Listings Add To Job List Company Profile View Company Reviews
Positive
Subjectivity Score: 0.8