Platform Infrastructure Engineer
Company
Arcee AI
Location
Remote
Type
Full Time
Job Description
About Us:
Arcee.ai is a cutting-edge AI company that empowers enterprises to own their GenAI strategy. We're a team of passionate and innovative engineers, researchers, and industry experts dedicated to pushing the boundaries of AI technology. We're looking for an exceptional Solution Architect to join our team and help design, develop, and deploy AI-powered solutions that meet the highest standards of quality, reliability, and performance.
About the role:
We’re looking for a Platform Infrastructure Engineer with a deep focus on Kubernetes and AWS EKS to build and scale our multi-tenant, multi-cluster infrastructure that hosts our SAAS products, enterprise products, and AI models. In this role, you’ll collaborate closely with a small, agile team to automate infrastructure provisioning, streamline deployment pipelines, and ensure the reliability and scalability of our platform. You’ll leverage tools like ArgoCD, Atlantis, Terraform, Terragrunt, Grafana observability stack, and work with deploying and orchestrating GPUs to drive a GitOps-first approach and cultivate operational excellence.Â
‍
What you’ll do:
- Architect, deploy, and maintain Kubernetes clusters on AWS EKS in a multi-tenant, multi-cluster environment that is portable to other cloud providers and VPCs.
- Own our Infrastructure as Code practices using Terraform and Terragrunt, ensuring consistency and repeatability
- Implement and manage GitOps workflows with ArgoCD to enhance delivery pipelines
- Set up, configure, and maintain Atlantis for automated Terraform workflow management
- Collaborate with developers, DevOps, and product teams to improve deployment speeds and system reliability
- Take part in writing and reviewing technical documentation, providing best practices and guidance for the broader engineering team
- Troubleshoot and resolve issues across infrastructure and networking.
- Help deploy, orchestrate, and monitor our GPUs
What we’re seeking:
- Experience deploying and orchestrating a Grafana Observability Stack (Alloy, Mimir, Loki, Tempo, Grafana) or similar monitoring solution.
- Experience deploying and orchestrating GPUs.
- Proven experience with Kubernetes in production, with readiness to tackle multi-cloud.
- Hands-on expertise with Terraform and Terragrunt for Infrastructure as Code
- Familiarity with GitOps methodologies and ArgoCD for continuous deployment
- Experience managing multi-tenant, multi-cluster environments at scale
- Strong scripting and automation skills (e.g., Python, Bash, Go)
- Solid understanding of networking concepts and cloud infrastructure (AWS preferred, other cloud providers acceptable)
- Clear communication, problem-solving mindset, and the ability to work effectively in a small, fast-moving teamÂ
‍
Equal Opportunity
We are an Equal Opportunity Employer, offering equal opportunity to all regardless of race, religion, gender identity, sexual orientation, age, citizenship, marital status, disability, and more. We would like to remind candidates that the listed qualifications for each role are not hard requirements, and we encourage them to apply if they feel they would be a good fit.
‍
Compensation
We offer competitive salaries, equity, and benefits. We base our salaries on location, role, and level as well as consideration of the candidate’s experience and overall qualifications.
‍
Date Posted
01/24/2025
Views
0
Similar Jobs
Senior Design Manager (Infrastructure) - Canonical
Views in the last 30 days - 0
Canonical a leading opensource provider seeks a Senior Design Manager to drive innovation in cloud and AI technologies The role offers remote work glo...
View DetailsSoftware Engineer Networking Software and Services - xAI
Views in the last 30 days - 0
The text describes xAIs mission to develop AI systems for understanding the universe and advancing human knowledge It outlines a role involving networ...
View DetailsAssociate Technical Support Engineer - Recharge
Views in the last 30 days - 0
Recharge is a subscription platform for innovative brands offering customer retention solutions They seek Technical Support roles with 247 coverage em...
View DetailsFull Stack Product Engineer - Jiga
Views in the last 30 days - 0
Jiga is a remotefriendly company focused on empowering engineers with trust autonomy and flexibility They emphasize simplicity ownership and impactful...
View DetailsSenior Product Designer - Org & Security - Typeform
Views in the last 30 days - 0
This job description outlines a role in developing an intelligent contact management system with AI capabilities The position involves designing user ...
View DetailsExecutive Director Patient Advocacy - Kyverna Therapeutics
Views in the last 30 days - 0
Kyverna Therapeutics is seeking an Executive Director for Patient Advocacy to lead initiatives in autoimmune disease treatment The role involves build...
View Details