Platform Infrastructure Engineer
Company
Arcee AI
Location
Remote
Type
Full Time
Job Description
About Us:
Arcee.ai is a cutting-edge AI company that empowers enterprises to own their GenAI strategy. We're a team of passionate and innovative engineers, researchers, and industry experts dedicated to pushing the boundaries of AI technology. We're looking for an exceptional Solution Architect to join our team and help design, develop, and deploy AI-powered solutions that meet the highest standards of quality, reliability, and performance.
About the role:
We’re looking for a Platform Infrastructure Engineer with a deep focus on Kubernetes and AWS EKS to build and scale our multi-tenant, multi-cluster infrastructure that hosts our SAAS products, enterprise products, and AI models. In this role, you’ll collaborate closely with a small, agile team to automate infrastructure provisioning, streamline deployment pipelines, and ensure the reliability and scalability of our platform. You’ll leverage tools like ArgoCD, Atlantis, Terraform, Terragrunt, Grafana observability stack, and work with deploying and orchestrating GPUs to drive a GitOps-first approach and cultivate operational excellence.Â
‍
What you’ll do:
- Architect, deploy, and maintain Kubernetes clusters on AWS EKS in a multi-tenant, multi-cluster environment that is portable to other cloud providers and VPCs.
- Own our Infrastructure as Code practices using Terraform and Terragrunt, ensuring consistency and repeatability
- Implement and manage GitOps workflows with ArgoCD to enhance delivery pipelines
- Set up, configure, and maintain Atlantis for automated Terraform workflow management
- Collaborate with developers, DevOps, and product teams to improve deployment speeds and system reliability
- Take part in writing and reviewing technical documentation, providing best practices and guidance for the broader engineering team
- Troubleshoot and resolve issues across infrastructure and networking.
- Help deploy, orchestrate, and monitor our GPUs
What we’re seeking:
- Experience deploying and orchestrating a Grafana Observability Stack (Alloy, Mimir, Loki, Tempo, Grafana) or similar monitoring solution.
- Experience deploying and orchestrating GPUs.
- Proven experience with Kubernetes in production, with readiness to tackle multi-cloud.
- Hands-on expertise with Terraform and Terragrunt for Infrastructure as Code
- Familiarity with GitOps methodologies and ArgoCD for continuous deployment
- Experience managing multi-tenant, multi-cluster environments at scale
- Strong scripting and automation skills (e.g., Python, Bash, Go)
- Solid understanding of networking concepts and cloud infrastructure (AWS preferred, other cloud providers acceptable)
- Clear communication, problem-solving mindset, and the ability to work effectively in a small, fast-moving teamÂ
‍
Equal Opportunity
We are an Equal Opportunity Employer, offering equal opportunity to all regardless of race, religion, gender identity, sexual orientation, age, citizenship, marital status, disability, and more. We would like to remind candidates that the listed qualifications for each role are not hard requirements, and we encourage them to apply if they feel they would be a good fit.
‍
Compensation
We offer competitive salaries, equity, and benefits. We base our salaries on location, role, and level as well as consideration of the candidate’s experience and overall qualifications.
‍
Date Posted
01/24/2025
Views
0
Similar Jobs
Staff Software Engineer - Vector Storage - Reddit
Views in the last 30 days - 0
This job description outlines a software engineering role focused on cloud infrastructure storage technologies and largescale systems It emphasizes co...
View DetailsSenior Software Engineer - Frontend - Tines
Views in the last 30 days - 0
This role offers opportunities to shape technical direction and product development in a supportive collaborative environment with a focus on impactfu...
View DetailsSenior Data Engineer - Loka, Inc
Views in the last 30 days - 0
Loka seeks a Senior Data Engineer to join their global team offering opportunities to work on innovative projects remote flexibility and career growth...
View DetailsSailPoint Engineer - Architect - Airitos
Views in the last 30 days - 0
This job description outlines a role requiring expertise in Identity and Access Management IAM with a focus on SailPoint Identity Security Cloud It em...
View DetailsSenior Backend Software Engineer - The Zebra
Views in the last 30 days - 0
The Zebra is seeking a Senior Software Engineer to join their inclusive growthoriented team in Austin The role involves developing scalable solutions ...
View DetailsHematology/Oncology Account Manager - Massive Bio, Inc.
Views in the last 30 days - 0
Massive Bio seeks a HematologyOncology Account Manager in Colombia to expand their clinical trial services The role involves managing provider network...
View Details