Linux System Engineer- HPC

KLA · Ann Arbor, MI

Company

KLA

Location

Ann Arbor, MI

Type

Full Time

Job Description

Company Overview
KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents systems and solutions for the manufacturing of wafers and reticles, integrated circuits, packaging, printed circuit boards and flat panel displays. The innovative ideas and devices that are advancing humanity all begin with inspiration, research and development. KLA focuses more than average on innovation and we invest 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work together with the world's leading technology providers to accelerate the delivery of tomorrow's electronic devices. Life here is exciting and our teams thrive on tackling really hard problems. There is never a dull moment with us.

Group/Division
With over 40 years of semiconductor process control experience, chipmakers around the globe rely on KLA to ensure that their fabs ramp next-generation devices to volume production quickly and cost-effectively. Enabling the movement towards advanced chip design, KLA's Global Products Group (GPG), which is responsible for creating all of KLA's metrology and inspection products, is looking for the best and the brightest research scientist, software engineers, application development engineers, and senior product technology process engineers. Central Engineering is KLA's largest engineering organization comprised of 9 Centers-of-Excellence (CoE) in various disciplines applied across all product groups in the company. These CoE include Handling & Automation, Precision Motion Control, Sensors & Image Acquisition, Platform Design, and Packaging Engineering, among others. Talent includes over 500 engineers across global centers in Israel, China, India, and the US. Each CoE contributes not just talent and deliverables per discipline toward product programs, but also subject matter expertise, best practices, roadmaps, specialized facilities, apparatus, models, and analytics. These differentiate KLA not only in WHAT we do, but also in HOW we do it.

KLA Central Engineering Platform team is driving technology forward by providing leading edge HPC environments for Physics Modeling as well as ML/DL Algo Development. Our products are engineered for security, reliability and scalability, running the full stack from infrastructure to applications to devices and hardware.

In this role, you will contribute Architectural Design, Deployment as well as Support as we begin to establish the organization to bring an HPC from Infancy to Enterprise through Digital Transformation Initiatives. You will identify and asses our developer requirements and uncover solutions, recommend, plan and drive these solutions to Production.

Responsibilities

  • Plan, build, mount, install and upgrade modern HPC systems with CPUs and GPUs.
  • Install and enable pre-release hardware for early-stage evaluation and prototyping.
  • Study feasibility and value of migrating to new versions and distributions of linux on existing hardware.
  • Enable exploring modern HPC software for adoption into KLA's tools.
  • Administer existing HPC clusters with CPUs, GPUs, and accelerators (in-house, and at-customer-site).
  • Own CI/CD pipelines that ensure stability of system-level HPC software deployed to internal customers.
  • Overseeing the company's security, backup and redundancy strategies
  • Coordinate with appropriate vendors for equipment purchasing, shipping, RMA's
  • Documentation - Ability to produce detailed documentation according to as build docs, Visio drawings
  • Support our installation base onsite at customer sites, travel (10%) to fix the issues at onsite
  • Support and conduct internal audits, help mitigate findings and implement improvement measures.
  • Hands-on experience with IaaS (terraform or similar technologies), PaaS
  • Strong understanding of TCP/IP fundamentals and Knowledge of protocols, DNS, DHCP, HTTP, LDAP, SMTP

Minimum Qualifications
  • 10+ years of previous experience deploying and administrating Linux servers
  • Deep understanding of operating systems, computer networks, and high performance applications
  • Strong Scripting Skills (Bash, Python)
  • Solid understanding of Docker or other container based systems
  • Experience with system management and monitoring tools such as Prometheus, Grafana, collected (and its plugins)
  • Ability to work well with developers & test engineers
  • Passion to maintain a stable high-quality cluster amidst all odds!
  • Multi-task - Ability to expeditiously organize, coordinate, manage, prioritize, and perform multiple tasks simultaneously to swiftly assess a situation, determine a logical course of action, and apply the appropriate response.
  • Bachelor's degree with +5 years' work experience, OR Master's degree with +3 years' work experience, OR PhD with 0 years' work experience

Things to Us Go Wow!

  • Experience with deployments with DL frameworks such as TensorFlow, and PyTorch
  • Experience in handling large scale distributed HPC clusters
  • Experience with supporting pre-release hardware and open-source operating systems.

COVID-19 Vaccination Requirement: Proof of full COVID-19 vaccination is required where permitted by law. KLA will consider reasonable accommodation as provided by applicable law. Please note that accommodation may not be possible where vaccination is required for an essential function of the position, including for international travel or customer site access.

The company offers a competitive and comprehensive benefits package including but not limited to the following: medical, dental, vision, life, and other voluntary benefits, 401(K) including company matching, employee stock purchase program (ESPP), student debt assistance, tuition reimbursement program, financial planning benefits, employee assistance program (EAP), paid time off and paid company holidays, family care and bonding leave.

KLA is proud to be an Equal Opportunity Employer. We do not discriminate on the basis of race, religion, color, national origin, sex, gender identity, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other status protected by applicable law. We will ensure that qualified individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us at [email protected] to request accommodation.

Date Posted

10/14/2022

Views

5

Back to Job Listings Add To Job List Company Profile View Company Reviews
Positive
Subjectivity Score: 0.8