IT - InfiniBand GPU, Sr Systems Engineer (Atlanta, Georgia)
Job Description
At Cadence, we hire and develop leaders and innovators who want to make an impact on the world of technology.
Responsibilities;
- Responsible for assisting with all projects and repairs throughout the data center.
- Participate in an on-call rotation and provide hands-on coverage during maintenance.
- Direct and perform tasks related to solving operational issues within the data center
- Analyze and design operations that will improve workflow, handle equipment layout, and help ensure accident prevention
- Support operations, including the physical layout of equipment.
- Customer deployments and ensure on-time bring-up of GPU Servers.
- InfiniBand fabric bring-up, configuration, and subnet management on the IB switch.
Want more jobs like this?
Get Software Engineering jobs that are Remote delivered to your inbox every week.

- Document existing operational processes, equipment, and processes.
- Utilize a framework for monitoring tools, escalating key issues, and ensuring timely service implementation.
- Diagnoses/troubleshoots/installs/repairs all software, hardware, and components.
- Installing, Basic Configuring, and Troubleshooting Networking Equipment: Routers and Switches.
- Good understanding of the OSI Model and TCP/IP protocol suite (IP, ARP, ICMP, TCP, UDP, SMTP, FTP, TFTP)
- Configure Terminal Servers for out-of-band management
- Manage daily issues, including daily health checks of servers and processes, working closely with end-users, development teams, and Infrastructure teams to prioritize, resolve, and mitigate outages.
- Server installation and maintenance (rack and stack, label, HDD, memory, CPU, RAID batteries, NICs, etc.)
- Able to review design documentation & validate equipment deployment according to plans
- Network installation and maintenance (rack and stack, label, cabling, parts replacement, etc.)
- The site builds and refreshes while meeting current quality standards
- Interact with onsite staff and vendors for hardware replacement, delivery, and diagnostics.
- Perform operational tasks associated with data center implementation, migration, deployments, cabling, rack, and stack.
- Responsible for assisting with all projects and repairs throughout the data center.
- Participate in an on-call rotation and provide hands-on coverage during maintenance.
Requirements;
- Experience with cluster bring-up, drivers, loading
- Experience with GPU end to end testing in a cluster with InfiniBand
- Experience with setup of GPU servers in a cluster.
- Need experience in Linux environments and proficiency in tasks such as shell scripting
- Excellent data center organization skills and meticulous attention to detail.
- Familiarity with fiber and copper network cabling, including IP and SAN deployments.
- Responsible for maintaining acceptable ticket loads and incident SLAs.
- Follow documented escalation procedures.
- Sync with global teams on various tasks and upcoming initiatives.
- Understand and adhere to documented policies, processes, and procedures
- Assist with process improvement initiatives and documentation of policies, processes, and procedures, including runbooks.
- Able to move 50+ pounds
#LI-MA1
We're doing work that matters. Help us solve what others can't.
Explore More
Date Posted
01/24/2025
Views
0
Similar Jobs
Software Engineer Networking Software and Services - xAI
Views in the last 30 days - 0
The text describes xAIs mission to develop AI systems for understanding the universe and advancing human knowledge It outlines a role involving networ...
View DetailsAssociate Technical Support Engineer - Recharge
Views in the last 30 days - 0
Recharge is a subscription platform for innovative brands offering customer retention solutions They seek Technical Support roles with 247 coverage em...
View DetailsFull Stack Product Engineer - Jiga
Views in the last 30 days - 0
Jiga is a remotefriendly company focused on empowering engineers with trust autonomy and flexibility They emphasize simplicity ownership and impactful...
View DetailsSenior Design Manager (Infrastructure) - Canonical
Views in the last 30 days - 0
Canonical a leading opensource provider seeks a Senior Design Manager to drive innovation in cloud and AI technologies The role offers remote work glo...
View DetailsSenior Product Designer - Org & Security - Typeform
Views in the last 30 days - 0
This job description outlines a role in developing an intelligent contact management system with AI capabilities The position involves designing user ...
View DetailsExecutive Director Patient Advocacy - Kyverna Therapeutics
Views in the last 30 days - 0
Kyverna Therapeutics is seeking an Executive Director for Patient Advocacy to lead initiatives in autoimmune disease treatment The role involves build...
View Details