Staff Site Reliability Engineer

Checkr · Remote

Company

Checkr

Location

Remote

Type

Full Time

Job Description

Checkr’s mission is to build a fairer future by designing technology to create opportunities for all. We believe all candidates, regardless of who they are, should have a fair chance to work. Established in 2014 and valued at $5B, Checkr is using technology to bring hiring to the next level. Our People Trust Platform uses machine learning to help thousands of companies modernize their background check process and make hiring safer, more efficient, and more inclusive. Some of our customers include Uber, Instacart, Doordash, Netflix, Compass Group, and Adecco.

A career with Checkr is an opportunity to work with some of the best and brightest minds, disrupt an industry for a better future, and give otherwise overlooked candidates access to employment. Checkr has been recognized in Forbes Best Startup Employers and is a top Y Combinator company by valuation.

We’re looking for a Staff Site Reliability Engineer (SRE). The Staff SRE must have extensive observability and mentorship experience to help:

  •  Lead the administration of tools like DataDog, Sentry, and PagerDuty
  •  Identify strategies to improve our full-stack telemetry and monitoring capabilities
  •  Mentor other SREs who contribute to observability-related work
  •  Help drive team objectives, goals, and key performance metrics

SREs work cross-functionally with Core Infrastructure, Platform, and Product Engineering combining operations work with software engineering principles to enable high-availability of Checkr’s production systems. You will serve as a partner to our Product Engineering teams to help make their services more performant, scalable, observable, and reliable. We believe every engineering team at Checkr should be responsible for the software they build, and SREs play a critical part in providing the tools, practices, and expertise to make that happen.

We are growing and evolving the SRE team to help meet Checkr’s product-first reliability goals for 2023 and beyond. Having established a strong foundation– including a containerized microservices architecture (AWS, Kong, Kubernetes, Kafka, MySQL, and MongoDB), CI/CD, full-stack monitoring, structured incident response, and a blameless postmortem culture--we are focused on implementing new capabilities like:

  • Automating observability and alerting across an ever-changing landscape of microservices
  • Automated Service Reliability Scorecards and Production Readiness Standards
  • Help drive organizational maturity by evolving and improving reliability and software engineering best practices
  • Software engineering project work, proposed and driven by individual SRE team members, to remove operational bottlenecks and increase velocity in ways we’ve never considered before


Responsibilities:

  • Expand and improve our observability and monitoring footprint
  • Collaborate with the engineering manager, product managers, other SREs, and cloud infrastructure engineers to create architectural plans, define project requirements, and establish technical standards
  • Connect with non-engineering business units across the organization to better our understanding of the needs and requirements of reliability and the incident management process for Checkr and our customers
  • Pair program with team members, review merge requests, help engineers get unblocked, and provide peer mentoring
  • Improve common operational challenges by building tools and automating scripts
  • Serve as the on-call incident commander to help debug and drive resolution of reliability issues, contribute to the postmortem, and work to prevent recurrence
  • Participate in design and production reviews for new features, products, and infrastructure
  • Audit and tune the configuration of systems owned by other engineering teams
  • Plan for the growth of Checkr’s infrastructure and infrastructure reliability/resiliency


What you bring:

  • Combination of experience in both software engineering and operations 
  • 7+ years working in a relevant role, including 3+ years of technical leadership experience mentoring junior engineers
  • 3+ years of experience architecting and administrating observability stacks, either managed or self-hosted (e.g. DataDog, New Relic, Prometheus, Elastic Stack/ELK)
  • Operation of containerized microservices running on public cloud, asynchronous event processing, and databases 
  • Strong command of Linux, Git and CI/CD pipelines
  • On-call support of highly available production systems 
  • Design and build new tools to automate repetitive tasks, prevent incidents or improve TTR using an object oriented programming language such as Python
  • Infrastructure as Code using tools like Terraform, Terragrunt, Ansible or CloudFormation
  • Act as the resident technical expert for our team to share knowledge, experience, and expertise, focusing on the more senior members when possible
  • Understand how application components interact, and contribute to architectural discussions
  • Unwavering commitment to operational security and best practices
  • Ownership: identify problems but also propose solutions, then go out and implement them--from submitting a merge request on another team’s repository to scoping out a new reliability project
  • Connection: motivated to help other teams improve their service reliability through reviews, pair programming, hands-on training and continuous improvement of tooling and services
  • Experience with and interest in chaos engineering (Gremlin, Litmus, Chaos Mesh) is a nice to have but not required
  • Work with the SRE manager and other engineering managers to define SLOs to help drive SLA compliance. 


What you get

  • A fast-paced and collaborative environment
  • Learning and development allowance
  • Competitive compensation and opportunity for advancement
  • 100% medical, dental, and vision coverage
  • Up to $25K reimbursement for fertility, adoption, and parental planning services
  • Flexible PTO policy
  • Monthly wellness stipend, home office stipend

 

One of Checkr’s core values is Transparency. To live by that value, we’ve made the decision to disclose salary ranges in all of our job postings. We use geographic cost of labor as an input to develop ranges for our roles and as such, each location where we hire may have a different range. If this role is remote, we have listed the top to the bottom of the possible range, but we will specify the target range for an exact location when you are selected for a recruiting discussion. The salary range for this role is $118,346-246,330.

 

Equal Employment Opportunities at Checkr

Checkr is committed to hiring talented and qualified individuals with diverse backgrounds for all of its tech, non-tech, and leadership roles. Checkr believes that the gathering and celebration of unique backgrounds, qualities, and cultures enriches the workplace.   

Checkr also welcomes the opportunity to consider qualified applicants with prior arrest or conviction records. Checkr’s commitment to diversity extends to hiring talented individuals in spite of a prior criminal history in accordance with local, state, and/or federal laws, including the San Francisco’s Fair Chance Ordinance.

 

#LI-Remote

Apply Now

Date Posted

03/07/2023

Views

5

Back to Job Listings Add To Job List Company Profile View Company Reviews
Positive
Subjectivity Score: 0.8

Similar Jobs

Software Engineer Networking Software and Services - xAI

Views in the last 30 days - 0

The text describes xAIs mission to develop AI systems for understanding the universe and advancing human knowledge It outlines a role involving networ...

View Details

Associate Technical Support Engineer - Recharge

Views in the last 30 days - 0

Recharge is a subscription platform for innovative brands offering customer retention solutions They seek Technical Support roles with 247 coverage em...

View Details

Full Stack Product Engineer - Jiga

Views in the last 30 days - 0

Jiga is a remotefriendly company focused on empowering engineers with trust autonomy and flexibility They emphasize simplicity ownership and impactful...

View Details

Senior Design Manager (Infrastructure) - Canonical

Views in the last 30 days - 0

Canonical a leading opensource provider seeks a Senior Design Manager to drive innovation in cloud and AI technologies The role offers remote work glo...

View Details

Senior Product Designer - Org & Security - Typeform

Views in the last 30 days - 0

This job description outlines a role in developing an intelligent contact management system with AI capabilities The position involves designing user ...

View Details

Executive Director Patient Advocacy - Kyverna Therapeutics

Views in the last 30 days - 0

Kyverna Therapeutics is seeking an Executive Director for Patient Advocacy to lead initiatives in autoimmune disease treatment The role involves build...

View Details