Sr Software Engineer - HPC Clusters, SLURM, ML

Mavensoft Technologies · Portland OR

Company

Mavensoft Technologies

Location

Portland OR

Type

Full Time

Job Description

Job Title : Sr Software Engineer - HPC Clusters, SLURM, ML
Location : 100% Remote
Duration : One year (possibility of extensions)
Pay rate : $75 (w2) plus benefits

Key Skills : HPC Clusters, Cluster Administrator, Cluster tech stack, SLURM, BeeGFS, M/L Training (GPU+CPU), AI centric, Simulation, 3D graphics.

Job Description :
The client is looking for an experienced cluster administrator to manage HPC clusters. The right candidate will have experience on SLURM and related technologies and will be familiar with workloads related to machine learning training and inference (GPU and CPU).

Job Responsibilities : • Serve as the primary contact for a GPU+CPU cluster • Collected team feedback and relayed to the support team (schedule downtimes/maintenance, propose changes to the cluster, etc.) • Perform capacity planning to help determine compute/storage needs for the team moving forward • Serve as the owner of the SLURM job scheduler, defining the configuration that better fits the team and developing/enabling advanced features • Serve as the team datasets owner (manage the datasets that live in the cluster and how people access them) • Help the team optimize/troubleshoot complex jobs/pipelines (AI centric, simulation, 3D graphics, etc.). • Educate the team on how to use the cluster (SLURM, BeeGFS, datasets, etc.), enabling a fast ramp up time of new scientists and engineers (via tutorials, presentations, wiki docs, etc.)

Required skills and experience : • Experience designing and managing large clusters with heterogeneous HW (CPUs, GPUs, etc.) • User-centric and results oriented. You can learn from data what the needs of our scientists/engineers will and can produce a cluster growth plan to fulfill these needs • Power user. You are willing to extensively test the different workflows that run in the cluster and help optimize them. • Cluster tech stack. You are an expert on cluster orchestration and management, familiar with technologies such as SLURM, BeeGFS, Docker, etc. (or you are willing to learn them quickly) • Good communication skills. You can effectively communicate with a variety of shareholders, including presenting plans to higher management and having technical discussions with engineers/scientists.

Minimum Educational Requirement : BS degree or higher

Website: www.mavensoft.com

Date Posted

10/13/2022

Views

5

Back to Job Listings Add To Job List Company Profile View Company Reviews
Positive
Subjectivity Score: 0.8

Similar Jobs

Software Engineer - DAT

Views in the last 30 days - 1

DAT is looking for a Software Engineer to join their team in Beaverton OR or Denver CO The role involves working in a full stack TypeScript ecosystem ...

View Details

Software Engineer (Mid-level) - Act-On Software

Views in the last 30 days - 7

ActOn is a leading marketing automation company looking for a Software Engineer to join their team The company offers a supportive and fun culture com...

View Details

IT Engineer - DAT

Views in the last 30 days - 0

DAT is seeking a strong IT Engineer to join their growing technical team and drive the evolution of their technology infrastructure and end user exper...

View Details

Senior Structural Engineer - HDR

Views in the last 30 days - 5

HDR is a company that specializes in engineering architecture and construction services They believe in diversity and collaboration and offer employee...

View Details

Senior Frontend Engineer - ICIS

Views in the last 30 days - 0

Cirium is a company that provides data and aviation analytics solutions to various industries They are looking for a senior frontend software engineer...

View Details

Intermediate Software Developer - Cornell Pump

Views in the last 30 days - 13

The job posting is for a design and programming position in Amazon Web Services AWS to support a cloudbased system The successful candidate will join ...

View Details