Senior Staff Site Reliability Engineer

Jobgether · US

Company

Jobgether

Location

US

Type

Full Time

Job Description

Team: IT

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Staff Site Reliability Engineer in the United States.

This role offers a unique opportunity to lead the reliability, scalability, and operational excellence of a large-scale, globally distributed platform. You will shape infrastructure strategy, guide deployment automation, and ensure systems perform seamlessly across multi-region environments. The position blends hands-on engineering with high-level leadership, providing influence over both core product architecture and operational best practices. You will mentor senior engineers, drive IaC and observability standards, and partner cross-functionally with Product, Security, and Customer Success teams. This is a highly dynamic, remote-friendly environment where technical vision and proactive problem-solving directly impact customer satisfaction and organizational resilience.

Accountabilities:

  • Lead the strategy and execution of infrastructure, deployment, and reliability initiatives across cloud, Kubernetes, and multi-region platforms
  • Design, build, and maintain scalable, high-availability systems, automation tools, and core frameworks in Go and Python
  • Serve as a technical leader and mentor for SREs and engineering teams, elevating practices across reliability, monitoring, and operational excellence
  • Oversee Infrastructure as Code (IaC) implementation and continuous deployment systems to ensure consistency and reproducibility
  • Act as senior escalation point for high-severity incidents and implement systemic solutions to prevent recurring failures
  • Define and enforce standards for observability, SLIs/SLOs, disaster recovery, chaos engineering, and production readiness
  • Collaborate with Engineering leadership, Product, Security, and Customer Success to align infrastructure capabilities with business and customer needs
  • Requirements:

    • 10+ years of experience in software engineering or site reliability roles with distributed systems
    • Deep expertise with cloud platforms (AWS, GCP, Azure), Kubernetes, containerized environments, and CI/CD systems
    • Strong coding skills in Go (preferred) and Python, capable of contributing to core frameworks and platform tooling
    • Hands-on experience with observability platforms (Prometheus, Grafana, Loki) and multi-region architectures
    • Experience managing infrastructure, SRE, or platform engineering teams supporting production systems
    • Advanced understanding of microservices, event-driven architectures, networking, and security best practices
    • Familiarity with databases such as MongoDB or Redis at scale
    • Excellent leadership, mentorship, and communication skills with the ability to drive cross-functional initiatives
    • Bonus: open-source contributions, U.S. security clearance eligibility, or prior experience with large-scale SaaS platforms
    • Benefits:

      • Fully remote work with flexible hours
      • Flexible paid time off, holidays, and vacation
      • Company-provided laptop and remote work benefits
      • Professional development through courses, books, and learning platforms
      • Stock options and equity participation
      • Multicultural and inclusive work environment
      • Vibrant, collaborative company culture
Apply Now

Date Posted

03/24/2026

Views

0

Back to Job Listings Add To Job List Company Profile View Company Reviews
Neutral
Subjectivity Score: 0
142,000+ Jobs Tracked
12,400+ Companies
1,930 Categories