Staff Engineer, GitLab Delivery - Operate

GitLab • North America,Latin America,EMEA

Company

GitLab

Location

North America,Latin America,EMEA

Type

Full Time

Job Description

An overview of this role

As a Staff Engineer within the GitLab Operate team you'll lead the technical direction for GitLab's self-managed deployment strategy with a particular focus on solving zero-downtime upgrades and operational excellence at scale. This is a high-impact technical leadership role where you'll architect and implement the systems that enable thousands of organizations to deploy upgrade and operate GitLab reliably in their own infrastructure.

You'll be the technical anchor for our newly formed Operate team driving the evolution of GitLab's deployment tooling from traditional packaging approaches toward cloud-native operator-driven automation. Your work will directly impact GitLab's ability to deliver new features to self-managed customers faster while dramatically reducing operational complexity and upgrading friction.

The GitLab Operate team serves as a critical bridge between GitLab engineering and our self-managed customers ensuring our products are easily deployable secure and scalable across a range of environments—from single-node VM deployments to large-scale Kubernetes clusters supporting tens of thousands of users.

What you’ll do

Technical Leadership & Architecture

Lead the technical vision and architecture for GitLab's cloud-native self-managed deployments and zero-downtime upgrades balancing operational simplicity customer needs and engineering constraints to improve upgrade success rates and reduce customer downtime
Establish and champion operational maturity standards service integration patterns and deployment models; provide cross-group technical leadership on complex initiatives so development teams can own the end-to-end lifecycle of their components while reducing upgrade risk and operational complexity

Platform Engineering & Development

Design implement and maintain production-grade Kubernetes Operators Helm charts and upgrade orchestration tooling that automate lifecycle management for self-managed GitLab deployments at varying scales reducing manual operations and configuration errors
Develop integration and automation frameworks that handle database migrations rolling deployments compatibility checks and rollbacks enabling product teams to ship new services with standardized deployment patterns and predictable release outcomes

Database & Application Lifecycle Management

Define and evolve database and application lifecycle strategies—including safe PostgreSQL migrations compatibility layers and pre-flight/validation checks—to minimize downtime failed upgrades and rollback frequency across self-managed environments

Cross-Functional Collaboration & Enablement

Collaborate with Product Management GitLab.com SRE GitLab Dedicated and development teams to define integration requirements align deployment patterns and operational practices and translate customer needs into scalable self-managed solutions
Mentor and enable engineers and customer-facing teams through design reviews code reviews documentation and runbooks that improve deployment reliability supportability and customer success

Production Operations & Reliability

Define and implement observability testing performance and resilience practices for self-managed deployments—including metrics logging alerting benchmarks capacity guidance and failure-handling patterns—and contribute to incident response and post-mortems to continuously improve reliability upgrade success rates and mean time to recovery (MTTR)

What you’ll bring

Required Experience & Skills

8+ years of software engineering experience with at least 3+ years in platform engineering or infrastructure roles
Expert-level Go proficiency (Ruby and Rails as a plus) with demonstrated ability to work in large complex codebases
Production Kubernetes experience including:
Building and maintaining Kubernetes Operators
Designing Helm charts for complex stateful applications
Understanding of Custom Resource Definitions (CRDs) admission controllers and controller patterns
Experience with stateful workloads persistent volumes and storage classes
Cloud-native architecture experience including service mesh observability stacks and infrastructure as code
Experience shipping production software that customers install and operate in their own infrastructure
Understanding of Linux systems including package management systemd and system-level debugging

Highly Valued Experience

Experience building or maintaining Operators for complex stateful applications (databases message queues etc.)
Ruby on Rails expertise and understanding of Rails application architecture
Infrastructure automation using Terraform Ansible or similar tools
Background in Site Reliability Engineering or DevOps with production on-call experience
Understanding of compliance and security requirements for enterprise software deployments
Experience with observability platforms
Open source contribution history particularly in infrastructure or deployment tooling

Technical Leadership Qualities

Technical influence and communication: Ability to design holistic solutions balancing multiple constraints write clear technical proposals and documentation and work across teams influencing without direct authority
Team development and execution: Track record of mentoring and elevating team capabilities through teaching and code review combined with pragmatic decision-making and bias for action when facing incomplete information

What Makes You Stand Out (Strong Plus)

You've built Kubernetes Operators in production and dealt with the operational complexities of stateful workload management
You have PostgreSQL knowledge including schema design and experience with database migrations at scale and understand the tradeoffs between downtime and complexity
You've shipped cloud native software that customers can run on-premises
You contribute to open source infrastructure projects and understand community dynamics
You can explain complex technical concepts clearly to both technical and non-technical audiences
You have experience with zero-downtime deployment strategies for monolithic applications transitioning to microservices
You've been on-call for production systems and understand what makes software operable

About the team

The Operate team is part of GitLab Delivery and focuses on delivering GitLab to self-managed users through supported and validated tooling. This includes maintaining and evolving the GitLab Omnibus package Helm Charts GitLab Operator and the GitLab Environment Toolkit (GET).

We partner with SRE Release Security and Development teams to ensure GitLab is easily deployable supportable and production-ready in diverse environments—from small single-node deployments to large enterprise-scale Kubernetes clusters.

Current challenges we're tackling

Zero-downtime upgrades: Enabling self-managed customers to upgrade GitLab without service interruption
Operational complexity: Reducing the burden of managing GitLab at scale while expanding our service architecture
Cloud-native transition: Building the next generation of deployment tooling while supporting existing customers
Upgrade velocity: Reducing the time it takes for self-managed customers to adopt new releases

Team structure

You'll be joining a newly consolidated Operate team that is building the capability to deliver GitLab's expanding service architecture to self-managed customers. As a Staff engineer you'll work closely with the engineering manager and product manager to define technical direction while mentoring other engineers on the team.

Apply Now

Date Posted

11/30/2025

Views

Back to Job Listings ❤️Add To Job List Company Info View Company Reviews

Positive

Subjectivity Score: 0.9

Similar Jobs

Senior Backend (Go) Engineer, Gitlab Delivery - Operate - GitLab

Views in the last 30 days - 0

The role of a Senior Backend Engineer on the GitLab Operate team involves focusing on delivering and supporting GitLab for selfmanaged customers by bu...

View Details

Staff Fullstack Engineer (Python/Vue.js) - AI Engineering: Duo Chat - GitLab

Views in the last 30 days - 0

This job posting is for a role at GitLab working on AIdriven capabilities specifically the Duo Chat group The role involves boosting AI capabilities i...

View Details

Senior Frontend Engineer - AI Engineering: Duo Chat - GitLab

Views in the last 30 days - 0

This job description outlines a Senior Frontend Engineer role focused on developing AI capabilities for GitLab Duo Chat collaborating across teams and...

View Details

Intermediate Backend Engineer (Ruby), AI Engineering, Custom Models - GitLab

Views in the last 30 days - 0

The Custom Models team at GitLab is responsible for developing and maintaining key components of GitLab AI allowing customers to run GitLab Duo featur...

View Details

Senior Backend Engineer (Ruby) - Tenant Scale, Geo - GitLab

Views in the last 30 days - 0

The role of a Senior Backend Engineer on the Geo team involves driving the development of scalable and performant product features for GitLab Replicat...

View Details

Intermediate Site Reliability Engineer - FinOps - GitLab

Views in the last 30 days - 0

GitLab is seeking a Site Reliability Engineer SRE for FinOps to ensure the reliability scalability and costefficiency of its userfacing services and i...

View Details