Staff Software Engineer - Grafana Databases, Managed Services
Job Description
Team: IT
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Staff Software Engineer – Grafana Databases, Managed Services in the United Kingdom.
In this role, you will operate at the intersection of large-scale distributed systems, streaming infrastructure, and cloud database platforms, helping power mission-critical observability services used globally. You will be responsible for the reliability, scalability, and performance of multi-cloud infrastructure that underpins high-throughput metrics, logs, and traces systems. Working in a deeply technical, remote-first engineering environment, you will influence architecture decisions while remaining hands-on in production systems. Your work will directly impact the stability and efficiency of large-scale data pipelines operating across hundreds of clusters. This is a high-autonomy role where you will partner with platform and database teams to solve complex distributed systems challenges. You will also play a key role in shaping operational excellence, reliability practices, and long-term system evolution across global infrastructure.
Accountabilities
In this role, you will take ownership of large-scale streaming and database infrastructure, ensuring reliability, scalability, and performance across hundreds of production clusters while driving architectural improvements and operational excellence.
- Operate and evolve large-scale multi-cloud streaming and database infrastructure across production environments
- Diagnose and resolve complex cross-layer failures involving storage, compute, networking, and control-plane systems
- Design and implement safe rollout, upgrade, and migration strategies across distributed systems at scale
- Improve observability, automation, and operational tooling to reduce system toil and increase reliability
- Define and evolve SLOs, error budgets, and reliability standards for shared infrastructure systems
- Partner with engineering teams to optimize query performance, data partitioning, and system scalability
- Serve as a primary escalation point for high-severity incidents and lead deep root cause analysis efforts
- Drive long-term architectural improvements to reduce systemic risks across multi-cluster environments
- Mentor engineers and contribute to best practices in distributed systems engineering and operational excellence
- 8+ years of software engineering experience in SRE, platform engineering, infrastructure, or distributed systems roles
- Strong experience with large-scale streaming or database systems (e.g., Kafka, Redpanda, ClickHouse, Cassandra, or similar)
- Hands-on expertise with Kubernetes in AWS, GCP, or Azure environments
- Proficiency in infrastructure-as-code tools such as Terraform, Helm, or similar
- Strong programming skills in systems-oriented languages (Go preferred)
- Deep understanding of distributed systems behavior, failure modes, and performance trade-offs
- Experience with observability, incident response, and writing post-incident reviews
- Strong knowledge of Linux internals, networking, storage systems, and cloud architecture
- Proven ability to lead technical initiatives and influence architectural decisions without formal authority
- Excellent communication skills with the ability to work effectively in remote, cross-functional teams
- Competitive compensation package including base salary, bonus (where applicable), and equity (RSUs)
- Fully remote-first working model with global collaboration across distributed teams
- 30 days annual leave, including designated shutdown days for full disconnection
- Equity ownership in the company’s long-term success through RSU participation
- Access to modern AI development tools with company-supported usage budgets
- Strong emphasis on autonomy, trust, and outcome-driven engineering culture
- Career growth opportunities in a fast-scaling global infrastructure organization
- Exposure to cutting-edge distributed systems and large-scale observability platforms
- Inclusive, transparent, and highly collaborative engineering environment
Requirements
You bring deep expertise in distributed systems, infrastructure engineering, or platform engineering, with strong experience operating high-scale production systems in cloud environments. You are highly technical, autonomous, and comfortable leading complex initiatives across global teams.
Benefits
Explore More
Date Posted
04/15/2026
Views
0
Similar Jobs
Staff Software Engineer, New Markets Middle East - Jobgether
Views in the last 30 days - 0
View Details