AI/ML HPC Principal Engineer

Chan Zuckerberg Biohub · San Francisco, CA

Company

Chan Zuckerberg Biohub

Location

San Francisco, CA

Type

Full Time

Job Description

The Chan Zuckerberg Biohub San Francisco (CZ Biohub SF) (https://www.czbiohub.org/sf/) is an independent nonprofit research institute that brings together three powerhouse universities - Stanford, UC Berkeley, and UC San Francisco - into a single collaborative technology and discovery engine. CZ Biohub SF itself supports some of the brightest, boldest engineers, data scientists, and biomedical researchers to investigate the fundamental mechanisms underlying disease and develop new technologies that will lead to actionable diagnostics and effective therapies. We are guided by our values of scholarly excellence; disruptive innovation; hands-on engineering/hacking/building; partnership and collaboration; open communication and respect; inclusiveness; and opportunity for all.

Our Vision

  • We pursue large scientific challenges that cannot be pursued in conventional environments
  • We enable individual investigators to pursue their riskiest and most innovative ideas
  • The technologies developed at CZ Biohub San Francisco facilitate research by scientists and clinicians at our home institutions and beyond

Diversity of thought, ideas, and perspectives are at the heart of CZ Biohub Network and enable disruptive innovation and scholarly excellence. We are committed to cultivating an inclusive organization where all colleagues feel inspired and know their work makes an important contribution.

The Opportunity

The Chan Zuckerberg Biohub Network has an immediate opening for an AI/ML High Performance Computing (HPC) Principal Engineer.  The CZ Biohub Network is composed of several new institutes that the Chan Zuckerberg Initiative created to do great science that cannot be done in conventional environments.  The CZ Biohub Network brings together researchers from across disciplines to pursue audacious, important scientific challenges. The Network consists of four institutes throughout the country; San Francisco, Silicon Valley, Chicago and New York City.  Each institute closely collaborates with the major universities in its local area.  Along with the world-class engineering team at the Chan Zuckerberg Initiative, the CZ Biohub supports several 100 of the brightest, boldest engineers, data scientists, and biomedical researchers in the country, with the mission of understanding the mysteries of the cell and how cells interact within systems.

The Biohub is expanding its global scientific leadership, particularly in the area of AI/ML, with the acquisition of the largest GPU cluster dedicated to AI for biology. The AI/ML HPC Principal Engineer will be tasked with helping to realize the full potential of this capability in addition to providing advanced computing capabilities and consulting support to science and technical programs. This position will work closely with many different science teams simultaneously to translate experimental descriptions into software and hardware requirements and across all phases of the scientific lifecycle, including data ingest, analysis, management and storage, computation, authentication, tool development and many other computing needs expressed by scientific projects.

This position reports to the Director for Scientific Computing and will be hired at a level commensurate with the skills, knowledge, and abilities of the successful candidate.

What You'll Do

  • Work with a wide community of scientific disciplinary experts to identify emerging and essential information technology needs and translate those needs into information technology requirements
  • Build an on-prem HPC infrastructure supplemented with cloud computing to support the expanding IT needs of the Biohub
  • Support the efficiency and effectiveness of capabilities for data ingest, data analysis, data management, data storage, computation, identity management, and many other IT needs expressed by scientific projects
  • Plan, organize, track and execute projects
  • Foster cross-domain community and knowledge-sharing between science teams with similar IT challenges
  • Research, evaluate and implement new technologies on a wide range of scientific compute, storage, networking, and data analytics capabilities
  • Promote and assist researchers with the use of Cloud Compute Services (AWS, GCP primarily) containerization tools, etc. to scientific clients and research groups
  • Work on problems of diverse scope where analysis of data requires evaluation of identifiable factors
  • Assist in cost & schedule estimation for the IT needs of scientists, as part of supporting architecture development and scientific program execution
  • Support Machine Learning capability growth at the CZ Biohub
  • Provide scientist support in deployment and maintenance of developed tools
  • Plan and execute all above responsibilities independently with minimal intervention

What You'll Bring 

Essential –

  • Bachelor’s Degree in Biology or Life Sciences is preferred. Degrees in Computer Science, Mathematics, Systems Engineering or a related field or equivalent training/experience also acceptable.
  • A minimum of 8 years of experience designing and building web-based working projects using modern languages, tools, and frameworks
  • Experience building on-prem HPC infrastructure and capacity planning
  • Experience and expertise working on complex issues where analysis of situations or data requires an in-depth evaluation of variable factors
  • Experience supporting scientific facilities, and prior knowledge of scientific user needs, program management, data management planning or lab-bench IT needs
  • Experience with HPC and cloud computing environments
  • Ability to interact with a variety of technical and scientific personnel with varied academic backgrounds
  • Strong written and verbal communication skills to present and disseminate scientific software developments at group meetings
  • Demonstrated ability to reason clearly about load, latency, bandwidth, performance, reliability, and cost and make sound engineering decisions balancing them
  • Demonstrated ability to quickly and creatively implement novel solutions and ideas

Technical experience includes -

  • Proven ability to analyze, troubleshoot, and resolve complex problems that arise in the HPC production compute, interconnect, storage hardware, software systems, storage subsystems
  • Configuring and administering parallel, network attached storage (Lustre, GPFS on ESS, NFS, Ceph) and storage subsystems (e.g. IBM, NetApp, DataDirect Network, LSI, VAST, etc.)
  • Installing, configuring, and maintaining job management tools (such as SLURM, Moab, TORQUE, PBS, etc.) and implementing fairshare, node sharing, backfill etc.. for compute and GPUs
  • Red Hat Enterprise Linux, CentOS, or derivatives and Linux services and technologies like dnsmasq, systemd, LDAP, PAM, sssd, OpenSSH, cgroups
  • Scripting languages (including Bash, Python, or Perl)
  • OpenACC, nvhpc, understanding of cuda driver compatibility issues
  • Virtualization (ESXi or KVM/libvirt), containerization (Docker or Singularity), configuration management and automation (tools like xCAT, Puppet, kickstart) and orchestration (Kubernetes, docker-compose, CloudFormation, Terraform.)
  • High performance networking technologies (Ethernet and Infiniband) and hardware (Mellanox and Juniper)
  • Configuring, installing, tuning and maintaining scientific application software (Modules, SPACK)
  • Familiarity with source control tools (Git or SVN)
  • Experience with supporting use of popular ML frameworks such as Pytorch, Tensorflow
  • Familiarity with cybersecurity tools, methodologies, and best practices for protecting systems used for science
  • Experience with movement, storage, backup and archive of large scale data

Nice to have - 

  • An advanced degree is strongly desired

The Chan Zuckerberg Biohub requires all employees, contractors, and interns, regardless of work location or type of role, to provide proof of full COVID-19 vaccination, including a booster vaccine dose, if eligible, by their start date. Those who are unable to get vaccinated or obtain a booster dose because of a disability, or who choose not to be vaccinated due to a sincerely held religious belief, practice, or observance must have an approved exception prior to their start date.

Compensation 

  • $212,000 - $291,500

New hires are typically hired into the lower portion of the range, enabling employee growth in the range over time. To determine starting pay, we consider multiple job-related factors including a candidate’s skills, education and experience, market demand, business needs, and internal parity. We may also adjust this range in the future based on market data. Your recruiter can share more about the specific pay range during the hiring process.

What We Provide

  • Resources to disrupt and innovate at the frontiers of our knowledge of biology and disease
  • A collegial and collaborative environment consisting of diverse expertise
  • Existing collaborations within CZ Biohub: Technology Platforms (Bioengineering, Computational Microscopy, Data Science, Genomics & Mass Spectrometry), Infectious Disease, and Quantitative Cell Science
  • Access to collaborators, resources and facilities at our three partner universities (Stanford, UC Berkeley, and UC San Francisco) and at partner organizations in the Bay Area and beyond
  • Competitive compensation and benefits commensurate with the experience

Benefits

We offer a robust benefits program that enables the important work Biohubbers do everyday. Our benefits include healthcare coverage, life and disability insurance, commuter subsidies, family planning services with fertility care, childcare stipend, 401(k) match, flexible time off and a generous parental leave policy. In addition, we honor our commitment to career development and our value of scholarly excellence through regular onsite opportunities to learn from the world's leading scientists.

The CZ Biohub Network is an equal opportunity employer committed to diversity of thought, ideas and perspectives. We are committed to cultivating an inclusive organization where all Biohubbers feel inspired and know their work makes an important contribution. Therefore, we provide employment opportunities without regard to age, race, color, ancestry, national origin, religion, disability, sex, gender identity or expression, sexual orientation, or any other protected status in accordance with applicable law.

Pursuant to the California Fair Chance Act, we will consider for employment qualified applicants with arrest and conviction records.

Headhunters and recruitment agencies may not submit resumes/CVs through this website or directly to managers. The CZ Biohub Network does not accept unsolicited headhunter and agency resumes. The CZ Biohub Network will not pay fees to any third-party agency or company that does not have a signed agreement with the CZ Biohub Network.

 

Apply Now

Date Posted

10/29/2023

Views

11

Back to Job Listings Add To Job List Company Profile View Company Reviews
Positive
Subjectivity Score: 0.8