Job Description
Team: IT
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Site Reliability Engineer in India.
This role is focused on building and maintaining highly reliable, scalable, and secure distributed systems that support global, large-scale cybersecurity platforms. You will be responsible for ensuring system availability, performance, and resilience across multi-region cloud environments. The position involves close collaboration with engineering and development teams to improve deployment processes, strengthen observability, and enhance infrastructure automation. You will work on Kubernetes-based architectures, cloud-native tooling, and CI/CD pipelines to support continuous delivery at scale. A strong emphasis is placed on incident response, root cause analysis, and proactive system optimization. This is a high-impact engineering role where reliability engineering directly supports mission-critical security services used by organizations worldwide.
Accountabilities:
- Deploy, manage, and scale distributed systems across multi-region cloud environments with a focus on high availability and performance.
- Design, maintain, and optimize Kubernetes-based infrastructure for large-scale production workloads.
- Build and manage Helm charts to enable consistent, automated, and repeatable deployments.
- Implement and maintain infrastructure-as-code solutions using tools such as Terraform and related automation frameworks.
- Monitor system health using observability tools such as Grafana, Prometheus, and logging stacks, ensuring proactive issue detection and resolution.
- Collaborate with development teams to improve CI/CD pipelines, deployment reliability, and production readiness.
- Lead incident response, troubleshooting, and root cause analysis for production issues.
- Develop and maintain runbooks, operational documentation, and best practices for system reliability.
- Drive continuous improvements in scalability, performance, automation, and cloud infrastructure efficiency.
- Support multi-region deployment strategies and global infrastructure optimization initiatives.
- 4–5 years of experience in Site Reliability Engineering, DevOps, or similar infrastructure-focused roles.
- Strong hands-on experience with Kubernetes in production environments.
- Experience with cloud platforms, especially AWS (EKS, VPC, S3, IAM, ECR, and related services).
- Solid understanding of infrastructure-as-code tools such as Terraform and Git-based workflows.
- Experience building and managing CI/CD pipelines and automation systems.
- Proficiency in Helm charts and containerized deployment strategies.
- Strong scripting skills in Bash and familiarity with at least one programming language (Go or Python preferred).
- Experience working with distributed systems, microservices architectures, and cloud-native ecosystems.
- Strong debugging, troubleshooting, and problem-solving skills in production environments.
- Experience with observability tools such as Grafana, Prometheus, Loki, or similar stacks.
- Opportunity to work on global-scale, mission-critical cybersecurity infrastructure
- Exposure to advanced cloud-native technologies and distributed systems architecture
- Flexible and remote-friendly work environment
- Strong focus on engineering excellence, ownership, and continuous learning
- Global collaboration with high-performing engineering teams
- Career growth in SRE, platform engineering, and cloud infrastructure domains
- Inclusive, innovation-driven culture focused on impact and reliability.
Requirements:
Benefits:
Explore More
Date Posted
05/18/2026
Views
0
Similar Jobs
Trainee - Associate Technical Support Engineer - Jobgether
Views in the last 30 days - 0
View DetailsStaff Engineer, Salesforce Sales & Service Cloud - Jobgether
Views in the last 30 days - 0
View Details