Site Reliablity Engineer

Jobgether · India

Company

Jobgether

Location

India

Type

Full Time

Job Description

Team: IT

 

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Site Reliability Engineer in India.

This role is focused on building and maintaining highly reliable, scalable, and secure distributed systems that support global, large-scale cybersecurity platforms. You will be responsible for ensuring system availability, performance, and resilience across multi-region cloud environments. The position involves close collaboration with engineering and development teams to improve deployment processes, strengthen observability, and enhance infrastructure automation. You will work on Kubernetes-based architectures, cloud-native tooling, and CI/CD pipelines to support continuous delivery at scale. A strong emphasis is placed on incident response, root cause analysis, and proactive system optimization. This is a high-impact engineering role where reliability engineering directly supports mission-critical security services used by organizations worldwide.

Accountabilities:

  • Deploy, manage, and scale distributed systems across multi-region cloud environments with a focus on high availability and performance.
  • Design, maintain, and optimize Kubernetes-based infrastructure for large-scale production workloads.
  • Build and manage Helm charts to enable consistent, automated, and repeatable deployments.
  • Implement and maintain infrastructure-as-code solutions using tools such as Terraform and related automation frameworks.
  • Monitor system health using observability tools such as Grafana, Prometheus, and logging stacks, ensuring proactive issue detection and resolution.
  • Collaborate with development teams to improve CI/CD pipelines, deployment reliability, and production readiness.
  • Lead incident response, troubleshooting, and root cause analysis for production issues.
  • Develop and maintain runbooks, operational documentation, and best practices for system reliability.
  • Drive continuous improvements in scalability, performance, automation, and cloud infrastructure efficiency.
  • Support multi-region deployment strategies and global infrastructure optimization initiatives.
  • Requirements:

    • 4–5 years of experience in Site Reliability Engineering, DevOps, or similar infrastructure-focused roles.
    • Strong hands-on experience with Kubernetes in production environments.
    • Experience with cloud platforms, especially AWS (EKS, VPC, S3, IAM, ECR, and related services).
    • Solid understanding of infrastructure-as-code tools such as Terraform and Git-based workflows.
    • Experience building and managing CI/CD pipelines and automation systems.
    • Proficiency in Helm charts and containerized deployment strategies.
    • Strong scripting skills in Bash and familiarity with at least one programming language (Go or Python preferred).
    • Experience working with distributed systems, microservices architectures, and cloud-native ecosystems.
    • Strong debugging, troubleshooting, and problem-solving skills in production environments.
    • Experience with observability tools such as Grafana, Prometheus, Loki, or similar stacks.
    • Benefits:

      • Opportunity to work on global-scale, mission-critical cybersecurity infrastructure
      • Exposure to advanced cloud-native technologies and distributed systems architecture
      • Flexible and remote-friendly work environment
      • Strong focus on engineering excellence, ownership, and continuous learning
      • Global collaboration with high-performing engineering teams
      • Career growth in SRE, platform engineering, and cloud infrastructure domains
      • Inclusive, innovation-driven culture focused on impact and reliability.
Apply Now

Date Posted

05/18/2026

Views

0

Back to Job Listings Add To Job List Company Profile View Company Reviews
Neutral
Subjectivity Score: 0

© 2026 Job Transparency. All rights reserved.