Staff Infrastructure Engineer — Observability

Jobgether · US

Company

Jobgether

Location

US

Type

Full Time

Job Description

Team: IT

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Staff Infrastructure Engineer — Observability based in the United States.

You will join a high-impact engineering organization responsible for building and operating the observability backbone of a large-scale, cloud-native platform. This role sits at the intersection of infrastructure, reliability, and platform engineering, with ownership over systems that enable real-time visibility into global production environments. You will design and evolve telemetry platforms that power detection, debugging, and performance optimization across distributed systems. Working across AWS and GCP environments, you will help define how large-scale observability is built, standardized, and automated. This is a deeply technical, hands-on role where architectural decisions directly influence engineering velocity and system reliability. You will also collaborate closely with senior engineers and cross-functional teams to ensure observability is embedded into every layer of the platform.

Accountabilities:

Own the design, architecture, and evolution of large-scale observability systems supporting distributed, cloud-native infrastructure, ensuring high reliability, scalability, and performance.

  • Build and optimize telemetry platforms using tools such as Prometheus, Grafana, Thanos/Mimir/Cortex, and OpenTelemetry pipelines
  • Architect scalable data ingestion, storage, and analysis systems for high-volume production environments
  • Drive observability strategy across engineering teams, defining standards for monitoring, logging, and tracing
  • Develop automation and self-service tooling to reduce operational overhead and improve engineering efficiency
  • Lead reliability improvements across multi-cloud environments (AWS and GCP), balancing performance, cost, and resilience
  • Own incident response, root-cause analysis, and continuous improvement of production observability systems
  • Mentor engineers, lead technical design reviews, and elevate engineering best practices across teams
  • Requirements:

    This role requires deep experience in infrastructure engineering and large-scale distributed systems, with strong expertise in observability platforms and cloud environments.

    • 8+ years of experience in Infrastructure Engineering, Site Reliability Engineering, or similar roles
    • Strong hands-on expertise with Prometheus, Grafana, Thanos/Mimir/Cortex, and OpenTelemetry
    • Experience designing and operating cloud-native systems in AWS or GCP environments
    • Proficiency with Kubernetes-based production environments (EKS, GKE, or equivalent)
    • Strong infrastructure-as-code skills using Terraform and Ansible
    • Experience building scalable, high-throughput distributed systems with a focus on reliability and cost efficiency
    • Strong programming experience in Go or similar languages (Python, Java), with willingness to work in Go
    • Experience leading technical architecture, mentoring engineers, and collaborating across product and platform teams
    • Familiarity with secure or regulated environments such as FedRAMP or government-compliant systems is highly valued
    • Benefits:

      • Competitive base salary range aligned with experience and location
      • Equity participation through restricted stock units
      • Employee stock purchase program
      • Comprehensive medical, dental, and vision insurance
      • 401(k) retirement plan with employer match
      • Flexible time off, paid holidays, and sick leave
      • Parental leave and family support benefits
      • Wellness, mental health, and lifestyle stipends
      • Home office and technology support allowances
      • Professional development and continuous learning opportunities
      • Additional voluntary benefits including life, disability, and legal coverage
Apply Now

Date Posted

07/02/2026

Views

0

Back to Job Listings Add To Job List Company Profile View Company Reviews
Neutral
Subjectivity Score: 0
142,000+ Jobs Tracked
12,400+ Companies
1,930 Categories