Site Reliability Engineer - Platform

C3 AI • Peninsula

Company

C3 AI

Location

Peninsula

Type

Full Time

Job Description

C3.ai, Inc. (NYSE:AI) is a leading Enterprise AI software provider for accelerating digital transformation. The proven C3 AI Platform provides comprehensive services to build enterprise-scale AI applications more efficiently and cost-effectively than alternative approaches. The C3 AI Platform supports the value chain in any industry with prebuilt, configurable, high-value AI applications for reliability, fraud detection, sensor network health, supply network optimization, energy management, anti-money laundering, and customer engagement. Learn more at: C3 AI

We are seeking a highly skilled Site Reliability Engineer (SRE) to join our team to manage, monitor, and optimize our C3 clusters on Kubernetes. The ideal candidate will have a deep understanding of Kubernetes, Cloud Infrastructure, and Infrastructure as Code (IaC) practices. You will be responsible for ensuring the reliability, scalability of our Kubernetes clusters and Cloud Infrastructure

Responsibilities:

  • Monitor and Manage Kubernetes Clusters: Ensure the stability, health, and scalability of Kubernetes Clusters, deploying applications and services on Kubernetes.
  • Kubernetes Management: Deploy, monitor, and scale applications on Kubernetes clusters. Maintain Helm charts, manage services, and ensure resource allocation for optimal cluster performance.
  • Cloud Infrastructure Management: Work with leading Cloud Platforms (AWS, GCP, Azure) to set up, configure, and manage infrastructure resources using Infrastructure as Code (Terraform, CloudFormation, etc.).
  • Monitoring & Incident Response: Set up monitoring solutions, define alerts, and manage the incident response process for any issues related to Jenkins, C3, or Kubernetes clusters.
  • Automate Infrastructure Processes: Build automation tools for scaling, monitoring, and maintaining infrastructure using modern tools like Terraform, Ansible, or equivalent.
  • Collaborate Across Teams: Work closely with development, services, and operations teams to ensure a seamless integration between application development and infrastructure.
  • Security & Compliance: Ensure all systems follow best practices in terms of security and compliance with relevant regulations. This includes role-based access, encryption, and automated vulnerability scanning.

Qualifications:

  • 3+ years of experience as an SRE, DevOps Engineer, or related role.
  • Hands-on experience with Kubernetes in production environments (managing clusters, deployments, services, and pods).
  • Proficiency in cloud platforms like AWS, GCP, or Azure, including managing infrastructure via IaC tools like Terraform, CloudFormation, or equivalent.
  • Familiarity with monitoring tools like Prometheus, Grafana or equivalent.
  • Experience with Helm and managing Kubernetes applications via Helm charts.
  • Strong scripting and automation skills in languages like Bash, Python, or Groovy.
  • Experience with CI/CD tools, GitOps, and best practices for continuous integration and delivery pipelines.
  • Understanding of networking concepts and security best practices in a cloud-native environment.
  • Incident management experience, including setting up on-call rotations, managing runbooks, and post-incident reviews.

C3 AI provides excellent benefits, a competitive compensation package and generous equity plan. 

California Pay Range

$129,000—$169,000 USD

C3 AI is proud to be an Equal Opportunity and Affirmative Action Employer. We do not discriminate on the basis of any legally protected characteristics, including disabled and veteran status. 

Apply Now

Date Posted

12/17/2024

Views

0

Back to Job Listings ❤️Add To Job List Company Info View Company Reviews
Positive
Subjectivity Score: 0.9

Similar Jobs

Support Engineer - Pricefx

Views in the last 30 days - 0

Pricefx a leading SaaS Pricing Price Optimization Management provider is seeking a Tier 34 Support Engineer The role involves providing technical sup...

View Details

People Operations Specialist II - Guardant Health

Views in the last 30 days - 0

Guardant Health a leading precision oncology company is seeking a detailoriented People Operations and Employee Relations Specialist II The role invol...

View Details

Senior Product Manager - Instrumental

Views in the last 30 days - 0

Instrumental is seeking a Senior Product Manager with extensive experience in enterprise SaaS products or deep domain expertise in electronics manufac...

View Details

Inside Sales & Technical Support Specialist - Gator Bio

Views in the last 30 days - 0

Gator Bio headquartered in Palo Alto CA is a leading developer and manufacturer of BioLayer Interferometry BLI instrumentation and consumable products...

View Details

Sr. Flight Software Engineer (Verification) - Reliable Robotics Corporation

Views in the last 30 days - 0

Reliable Robotics is a team of missiondriven engineers developing safetyenhancing technology for aviation aiming to make air transportation safer more...

View Details

Distributed Systems Engineer - Kumo

Views in the last 30 days - 0

Kumo is a company building a machine learning platform for data lakehouses enabling data scientists to train powerful Graph Neural Net models directly...

View Details