Senior Site Reliability Engineer

Jobgether · US

Company

Jobgether

Location

US

Type

Full Time

Job Description

Team: IT

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer in United States.

This role is focused on ensuring the reliability, scalability, and performance of a modern, cloud-native platform that supports privacy, security, and data-driven services at enterprise scale. You will act as a senior technical owner of production stability, working closely with engineering, security, and developer experience teams to embed strong reliability practices across the software lifecycle. The environment is fast-moving and highly collaborative, requiring a balance of hands-on engineering and strategic thinking. You will help define and evolve SRE standards, turning incidents and operational learnings into long-term systemic improvements. This is a high-impact position where your work directly influences platform resilience, customer experience, and engineering efficiency. It offers the opportunity to shape observability, incident response, and infrastructure strategy in a remote-first organization.

Accountabilities:

  • Lead reliability design and production readiness reviews for services, ensuring strong observability, safe deployments, and rollback strategies
  • Build, operate, and improve observability systems including logging, metrics, tracing, dashboards, alerts, and runbooks for incident response
  • Own incident management processes, including on-call participation, escalation handling, post-incident reviews, and long-term remediation tracking
  • Design and execute disaster recovery testing, game days, and resilience exercises to validate system robustness and reduce failure points
  • Perform capacity planning and cloud cost optimization to ensure scalable, efficient, and high-performing infrastructure
  • Identify systemic reliability risks and drive cross-team initiatives to reduce incidents and improve platform stability
  • Collaborate with engineering and security teams to integrate reliability practices into CI/CD pipelines, tooling, and development workflows
  • Continuously improve on-call operations, automation, alerting quality, and operational documentation
  • Requirements:

    • 5+ years of experience in Site Reliability Engineering, Production Engineering, Infrastructure Engineering, or similar production-focused roles
    • Strong hands-on experience with cloud infrastructure (ideally AWS), including compute, networking, storage, and security services
    • Proficiency in at least one programming language such as Python, JavaScript, or TypeScript, with ability to review and understand production code
    • Experience with infrastructure as code and CI/CD tools such as Terraform, CloudFormation, or equivalent platforms
    • Deep knowledge of observability tools (e.g., Datadog or similar), including alert design, monitoring strategies, and incident signal management
    • Proven experience leading incident response, root cause analysis, and postmortem processes with actionable outcomes
    • Strong communication and collaboration skills, with ability to influence across engineering teams without formal authority
    • Experience participating in or improving on-call rotations, escalation workflows, and operational readiness practices
    • Bachelor’s degree in a technical field or equivalent practical experience
    • Ability to thrive in a remote, high-autonomy environment with strong ownership and execution discipline
    • Benefits:

      • Competitive salary aligned with experience and location
      • Equity participation as part of total compensation package
      • Flexible remote-first work environment
      • Comprehensive health, dental, and vision insurance
      • 401(k) retirement plan with company match
      • Flexible PTO and paid parental leave
      • Home office support and remote work stipend
      • Strong learning culture with growth and development opportunities.
Apply Now

Date Posted

04/29/2026

Views

0

Back to Job Listings Add To Job List Company Profile View Company Reviews
Neutral
Subjectivity Score: 0
142,000+ Jobs Tracked
12,400+ Companies
1,930 Categories