Senior Staff Site Reliability Engineer at Jobgether

Team: IT

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Staff Site Reliability Engineer in the United States.

This role offers a unique opportunity to lead the reliability, scalability, and operational excellence of a large-scale, globally distributed platform. You will shape infrastructure strategy, guide deployment automation, and ensure systems perform seamlessly across multi-region environments. The position blends hands-on engineering with high-level leadership, providing influence over both core product architecture and operational best practices. You will mentor senior engineers, drive IaC and observability standards, and partner cross-functionally with Product, Security, and Customer Success teams. This is a highly dynamic, remote-friendly environment where technical vision and proactive problem-solving directly impact customer satisfaction and organizational resilience.

Accountabilities:

Lead the strategy and execution of infrastructure, deployment, and reliability initiatives across cloud, Kubernetes, and multi-region platforms
Design, build, and maintain scalable, high-availability systems, automation tools, and core frameworks in Go and Python
Serve as a technical leader and mentor for SREs and engineering teams, elevating practices across reliability, monitoring, and operational excellence
Oversee Infrastructure as Code (IaC) implementation and continuous deployment systems to ensure consistency and reproducibility
Act as senior escalation point for high-severity incidents and implement systemic solutions to prevent recurring failures
Define and enforce standards for observability, SLIs/SLOs, disaster recovery, chaos engineering, and production readiness
Collaborate with Engineering leadership, Product, Security, and Customer Success to align infrastructure capabilities with business and customer needs

Requirements:

10+ years of experience in software engineering or site reliability roles with distributed systems
Deep expertise with cloud platforms (AWS, GCP, Azure), Kubernetes, containerized environments, and CI/CD systems
Strong coding skills in Go (preferred) and Python, capable of contributing to core frameworks and platform tooling
Hands-on experience with observability platforms (Prometheus, Grafana, Loki) and multi-region architectures
Experience managing infrastructure, SRE, or platform engineering teams supporting production systems
Advanced understanding of microservices, event-driven architectures, networking, and security best practices
Familiarity with databases such as MongoDB or Redis at scale
Excellent leadership, mentorship, and communication skills with the ability to drive cross-functional initiatives
Bonus: open-source contributions, U.S. security clearance eligibility, or prior experience with large-scale SaaS platforms

Benefits:

Fully remote work with flexible hours
Flexible paid time off, holidays, and vacation
Company-provided laptop and remote work benefits
Professional development through courses, books, and learning platforms
Stock options and equity participation
Multicultural and inclusive work environment
Vibrant, collaborative company culture

Senior Staff Site Reliability Engineer

Company

Location

Type

Job Description

Accountabilities:

Requirements:

Benefits:

Explore More

Date Posted

Views

Similar Jobs

Staff Engineer (Platform) - Jobgether

Senior Software Engineer, Fullstack - Jobgether

Senior Software Engineer, Developer Experience - Jobgether

Senior Engineer (Product) - Jobgether

Senior AI Platform Engineer - Jobgether

Product Reliability Engineer - Jobgether