Job Description
Team: IT
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Staff Site Reliability Engineer in the United States.
This role offers a unique opportunity to lead the reliability, scalability, and operational excellence of a large-scale, globally distributed platform. You will shape infrastructure strategy, guide deployment automation, and ensure systems perform seamlessly across multi-region environments. The position blends hands-on engineering with high-level leadership, providing influence over both core product architecture and operational best practices. You will mentor senior engineers, drive IaC and observability standards, and partner cross-functionally with Product, Security, and Customer Success teams. This is a highly dynamic, remote-friendly environment where technical vision and proactive problem-solving directly impact customer satisfaction and organizational resilience.
Accountabilities:
- Lead the strategy and execution of infrastructure, deployment, and reliability initiatives across cloud, Kubernetes, and multi-region platforms
- Design, build, and maintain scalable, high-availability systems, automation tools, and core frameworks in Go and Python
- Serve as a technical leader and mentor for SREs and engineering teams, elevating practices across reliability, monitoring, and operational excellence
- Oversee Infrastructure as Code (IaC) implementation and continuous deployment systems to ensure consistency and reproducibility
- Act as senior escalation point for high-severity incidents and implement systemic solutions to prevent recurring failures
- Define and enforce standards for observability, SLIs/SLOs, disaster recovery, chaos engineering, and production readiness
- Collaborate with Engineering leadership, Product, Security, and Customer Success to align infrastructure capabilities with business and customer needs
- 10+ years of experience in software engineering or site reliability roles with distributed systems
- Deep expertise with cloud platforms (AWS, GCP, Azure), Kubernetes, containerized environments, and CI/CD systems
- Strong coding skills in Go (preferred) and Python, capable of contributing to core frameworks and platform tooling
- Hands-on experience with observability platforms (Prometheus, Grafana, Loki) and multi-region architectures
- Experience managing infrastructure, SRE, or platform engineering teams supporting production systems
- Advanced understanding of microservices, event-driven architectures, networking, and security best practices
- Familiarity with databases such as MongoDB or Redis at scale
- Excellent leadership, mentorship, and communication skills with the ability to drive cross-functional initiatives
- Bonus: open-source contributions, U.S. security clearance eligibility, or prior experience with large-scale SaaS platforms
- Fully remote work with flexible hours
- Flexible paid time off, holidays, and vacation
- Company-provided laptop and remote work benefits
- Professional development through courses, books, and learning platforms
- Stock options and equity participation
- Multicultural and inclusive work environment
- Vibrant, collaborative company culture
Requirements:
Benefits:
Explore More
Date Posted
03/24/2026
Views
0
Similar Jobs
Senior Software Engineer, Developer Experience - Jobgether
Views in the last 30 days - 0
View Details