Platform Infrastructure Engineer

Arcee AI • Remote

Company

Arcee AI

Location

Remote

Type

Full Time

Job Description

About Us:
Arcee.ai is a cutting-edge AI company that empowers enterprises to own their GenAI strategy. We're a team of passionate and innovative engineers, researchers, and industry experts dedicated to pushing the boundaries of AI technology. We're looking for an exceptional Solution Architect to join our team and help design, develop, and deploy AI-powered solutions that meet the highest standards of quality, reliability, and performance.


About the role:

We’re looking for a Platform Infrastructure Engineer with a deep focus on Kubernetes and AWS EKS to build and scale our multi-tenant, multi-cluster infrastructure that hosts our SAAS products, enterprise products, and AI models. In this role, you’ll collaborate closely with a small, agile team to automate infrastructure provisioning, streamline deployment pipelines, and ensure the reliability and scalability of our platform. You’ll leverage tools like ArgoCD, Atlantis, Terraform, Terragrunt, Grafana observability stack, and work with deploying and orchestrating GPUs to drive a GitOps-first approach and cultivate operational excellence. 

‍

What you’ll do:

  • Architect, deploy, and maintain Kubernetes clusters on AWS EKS in a multi-tenant, multi-cluster environment that is portable to other cloud providers and VPCs.
  • Own our Infrastructure as Code practices using Terraform and Terragrunt, ensuring consistency and repeatability
  • Implement and manage GitOps workflows with ArgoCD to enhance delivery pipelines
  • Set up, configure, and maintain Atlantis for automated Terraform workflow management
  • Collaborate with developers, DevOps, and product teams to improve deployment speeds and system reliability
  • Take part in writing and reviewing technical documentation, providing best practices and guidance for the broader engineering team
  • Troubleshoot and resolve issues across infrastructure and networking.
  • Help deploy, orchestrate, and monitor our GPUs


What we’re seeking:

  • Experience deploying and orchestrating a Grafana Observability Stack (Alloy, Mimir, Loki, Tempo, Grafana) or similar monitoring solution.
  • Experience deploying and orchestrating GPUs.
  • Proven experience with Kubernetes in production, with readiness to tackle multi-cloud.
  • Hands-on expertise with Terraform and Terragrunt for Infrastructure as Code
  • Familiarity with GitOps methodologies and ArgoCD for continuous deployment
  • Experience managing multi-tenant, multi-cluster environments at scale
  • Strong scripting and automation skills (e.g., Python, Bash, Go)
  • Solid understanding of networking concepts and cloud infrastructure (AWS preferred, other cloud providers acceptable)
  • Clear communication, problem-solving mindset, and the ability to work effectively in a small, fast-moving team 

‍

Equal Opportunity

We are an Equal Opportunity Employer, offering equal opportunity to all regardless of race, religion, gender identity, sexual orientation, age, citizenship, marital status, disability, and more. We would like to remind candidates that the listed qualifications for each role are not hard requirements, and we encourage them to apply if they feel they would be a good fit.

‍

Compensation

We offer competitive salaries, equity, and benefits. We base our salaries on location, role, and level as well as consideration of the candidate’s experience and overall qualifications.

‍

Apply Now

Date Posted

01/24/2025

Views

0

Back to Job Listings ❤️Add To Job List Company Info View Company Reviews
Positive
Subjectivity Score: 0.9

Similar Jobs

Senior Design Manager (Infrastructure) - Canonical

Views in the last 30 days - 0

Canonical a leading opensource provider seeks a Senior Design Manager to drive innovation in cloud and AI technologies The role offers remote work glo...

View Details

Software Engineer Networking Software and Services - xAI

Views in the last 30 days - 0

The text describes xAIs mission to develop AI systems for understanding the universe and advancing human knowledge It outlines a role involving networ...

View Details

Associate Technical Support Engineer - Recharge

Views in the last 30 days - 0

Recharge is a subscription platform for innovative brands offering customer retention solutions They seek Technical Support roles with 247 coverage em...

View Details

Full Stack Product Engineer - Jiga

Views in the last 30 days - 0

Jiga is a remotefriendly company focused on empowering engineers with trust autonomy and flexibility They emphasize simplicity ownership and impactful...

View Details

Senior Product Designer - Org & Security - Typeform

Views in the last 30 days - 0

This job description outlines a role in developing an intelligent contact management system with AI capabilities The position involves designing user ...

View Details

Executive Director Patient Advocacy - Kyverna Therapeutics

Views in the last 30 days - 0

Kyverna Therapeutics is seeking an Executive Director for Patient Advocacy to lead initiatives in autoimmune disease treatment The role involves build...

View Details