Chief HPC Engineer
Company
EPAM Systems
Location
Barra do Garças, Brazil
Type
Full Time
Job Description
We are seeking a Chief HPC Engineer with robust technical skills in HPC infrastructure to manage day-to-day operations and engineering activities within our HPC environment. The ideal candidate will have a strong engineering background with significant hands-on experience in deployment and optimization. This leadership role requires strategic oversight and a proactive approach to maintaining and enhancing system performance and reliability.
#LI-DNI
Responsibilities
- Support and oversee HPC infrastructure
- Implement Infrastructure as Code (IaC) for system automation
- Lead incident resolution efforts, as well as software and hardware upgrades
- Guide and mentor a team of HPC engineers
- Strategize and implement system scalability and efficiency improvements
- Ensure system security and compliance with industry standards
- Develop and monitor key performance indicators to assess system health
- Foster strong vendor relationships for system hardware and software procurement
- Lead research and adoption of new technologies to keep the infrastructure at the cutting edge
- Facilitate collaboration across departments to align HPC strategies with organizational goals
Want more jobs like this?
Get jobs in Barra do GarΓ§as, Brazil delivered to your inbox every week.
- Minimum of 7 years of experience as an HPC Engineer
- At least 2 years of relevant leadership experience
- Proficiency in Linux (any rpm-based), including compiling kernel modules and using debugging tools like strace, coredump, and tcpdump
- Experienced in managing HPC job schedulers such as IBM LSF and Slurm
- Skilled in configuring and implementing Bright Cluster Manager
- Understanding of both GPFS and Lustre file systems
- Familiarity with InfiniBand and OmniPath network interconnect technologies
- Fluent English communication skills at a C1 level or higher
- Proficiency in hardware diagnostics, upgrades, and tuning, including HCA InfiniBand and disk arrays from Lustre, Vast, IBM
- Capability to utilize infrastructure monitoring tools like Zabbix, Splunk, or Grafana
- Understanding of Easybuild
- Experience working within a GxP environment
- Familiarity with project and service management tools like Jira and ServiceNow
Date Posted
11/19/2024
Views
0
Similar Jobs
Staff Site Reliability Engineer, DevOps - Pismo
Views in the last 30 days - 0
The DevOps squad is committed to enhancing the Pismo CDCD Platform through pipeline creation infrastructure automation and operational task optimizati...
View DetailsSenior Front-End Engineer - CloudWalk
Views in the last 30 days - 0
CloudWalk a rapidly growing fintech is seeking a Senior FrontEnd Engineer with deep technical expertise and leadership skills The ideal candidate will...
View DetailsPrincipal AI-Driven Engineer - CloudWalk
Views in the last 30 days - 0
CloudWalk is seeking a Principal AIDriven Engineer to lead the development of scalable reliable and AIdriven solutions The role involves architecting ...
View DetailsSenior Software Engineer - Payments Team - Podium
Views in the last 30 days - 0
Podium is seeking a full stack software engineer based in Brazil with 3 years of professional software development experience The ideal candidate shou...
View DetailsSoftware Engineer II, Backend: Uber for Business - Uber
Views in the last 30 days - 0
Uber for Business is seeking a backend engineer for a hybrid position based in São Paulo Brazil The role involves collaborating with team members to b...
View DetailsSoftware Engineer - Trimble
Views in the last 30 days - 0
Trimble a leading technology company with over 40 years of experience is seeking a Software Engineer for its transport and logistics division The role...
View Details