Senior Site Reliability Engineer, Data Stores: DBRE

GitLab · North America

Company

GitLab

Location

North America

Type

Full Time

Job Description

Site Reliability Engineers (SREs) are responsible for keeping all user-facing services and other GitLab production systems running smoothly. SREs are a blend of pragmatic operators and software craftspeople that apply sound engineering principles operational discipline and mature automation to our environments and the GitLab codebase. We specialize in systems whether it be networking the Linux kernel or some more specific interest in scaling algorithms or distributed systems.

The Database Reliability Team 's mission is to build run and own the entire lifecycle of the PostgreSQL database engine for GitLab.com. The team is focused on owning the reliability scalability performance & security of the database engine and its supporting services. The team should be seeking to build their services on top of Reliability::Foundations services and cloud vendor managed products where appropriate to reduce complexity improve efficiency and deliver new capabilities quicker.

GitLab.com is a unique site and it brings unique challenges–it’s the biggest GitLab instance in existence. In fact it’s one of the largest single-tenancy open-source SaaS sites on the internet. The experience of our team feeds back into other engineering groups within the company as well as to GitLab customers running self-managed installations.

Responsibilities

  • Automating every operational task is a core requirement for this role. For example package updates configuration changes across all environments creating tools for automatic provisioning of user facing services etc.

  • Responding to platform emergencies alerts and escalations from Customer Support.

  • Ensure systems exist to manage software life-cycles (e.g. Operating Systems) with a minimum of manual effort.

  • Develop a fully automated multi-environment observability stack based on the existing SaaS system and extend it to predict capacity needs based on the usage patterns.

  • Plan for new service roll-outs expansion and capacity management of existing services and work with users to optimise their resource consumption.

As an SRE you will:

  • Work on database reliability and performance aspects for GitLab.com from within the SRE team as well as work on shipping solutions with the product.

  • Analyze solutions and implement best practices for our main PostgreSQL database cluster and its components.

  • Work on observability of relevant database metrics and make sure we reach our database objectives.

  • Work with peer SREs to roll out changes to our production environment and help mitigate database-related production incidents.

  • OnCall support on rotation with the team.

  • Provide database expertise to engineering teams (for example through reviews of database migrations queries and performance optimizations).

  • Work on automation of database infrastructure and help engineering succeed by providing self-service tools.

  • Use the GitLab product to run GitLab.com as a first resort and improve the product as much as possible.

  • Plan the growth of GitLab's database infrastructure.

  • Design build and maintain core database infrastructure pieces that allow GitLab to scale to support hundreds of thousands of concurrent users.

  • Support and debug database production issues across services and levels of the stack.

  • Make monitoring and alerting alert on symptoms and not on outages.

  • Document every action so your learnings turn into repeatable actions and then into automation.

You may be a fit to this role if you:

  • Have strong experience running PostgreSQL in large production environments and a solid understanding of the internals of PostgreSQL

  • Have strong experience with infrastructure automation and configuration management (Chef Ansible Puppet Terraform…)

  • Have solid understanding of SQL and PL/pgSQL

  • Significant experience working in a Large SaaS distributed Systems production environment

  • Share our values and work in accordance with those values.

  • Have an urge to document all the things so you don't need to learn the same thing twice and an urge for delivering quickly and iterating fast.

  • Have a proactive go-for-it attitude. When you see something broken you can't help but fix it

  • Strong data modeling and data structure design skills

  • Bonus: Strong programming skills as a (former) backend engineer - Preferably with Ruby and/or Go.

Projects you could work on:

  • Review analyze and implement solutions regarding database administration (e.g. backups performance tuning)

  • Work with Terraform Chef and other tools to build mature automation (automatic setup new replicas or testing and monitoring of backups).

  • Implement self-service tools for our engineers using GitLab ChatOps.

  • Provide technical assistance and support to other teams on database and database-related application design methodologies system resources application tuning.

  • Review database related changes from engineering teams (e.g. database migrations).

  • Recommend query and schema changes to optimize the performance of database queries.

  • Jump on a production incident to mitigate database-related issues on GitLab.com.

  • Participate actively in the infrastructure design and scalability considerations focusing on data storage aspects.

  • Make sure we know how to take the next step to scale the database.

  • Design and develop specifications for future database requirements including enhancements upgrades and capacity planning; evaluate alternatives; and make appropriate recommendations.

Performance Indicators

Site Reliability Engineers have the following job-family performance indicators:

How GitLab will support you

Please note that we welcome interest from candidates with varying levels of experience; many successful candidates do not meet every single requirement. Additionally studies have shown that people from underrepresented groups are less likely to apply to a job unless they meet every single qualification. If you're excited about this role please apply and allow our recruiters to assess your application.

Apply Now

Date Posted

03/11/2024

Views

7

Back to Job Listings Add To Job List Company Profile View Company Reviews
Neutral
Subjectivity Score: 0.7

Similar Jobs

Enterprise Solutions Engineer - Hightouch

Views in the last 30 days - 0

The job description highlights a role in solving complex technical challenges collaborating with stakeholders and enhancing customer experiences throu...

View Details

Engineering Manager - Growth - Zapier

Views in the last 30 days - 0

Zapier seeks an Engineering Manager to lead their Personalization Platform team emphasizing growth innovation and inclusive hiring practices The role ...

View Details

Account Executive - Enterprise - Zapier

Views in the last 30 days - 0

Zapier is seeking an experienced Enterprise Account Executive to join their upmarket sales team emphasizing AI collaboration company values and a remo...

View Details

Social Media Marketing Internship - FamFluence

Views in the last 30 days - 0

This job posting describes a virtual Social Media Marketing Intern position with tasks involving content management analytics and strategy development...

View Details

Account Executive - Qualio

Views in the last 30 days - 0

Qualios platform helps regulated organizations achieve compliance and reduce risk through automated solutions The job opportunity emphasizes growth co...

View Details

Technical Support Architect - Assent

Views in the last 30 days - 0

Assent a leading supply chain sustainability solution celebrates reaching a 100M ARR milestone and Centaur Status The company emphasizes its hybrid wo...

View Details