Senior Site Reliability Engineer

Toast · USA

Company

Toast

Location

USA

Type

Full Time

Job Description

Toast is driven by building the restaurant platform that helps restaurants adapt, take control, and get back to what they do best: building the businesses they love.

At Toast, our Site Reliability Engineers (SREs) are responsible for enabling our engineering teams to ensure customer-facing services and other Toast production systems are running smoothly. SREs are a blend of pragmatic operators and software craftspeople who apply sound software engineering principles, operational discipline, and mature automation to our environments and our codebase. 

 

About this roll* (Responsibilities) 

  • Implement and evolve a world-class observability technology stack that allows rapid detection of issues in our system and enables root cause analysis (25%)
    • Provide scalable metrics and dashboarding solutions for R&D 
    • Provide distributed tracing capabilities to visualize and track issues across our complex system 
    • Provide log aggregation and insights for R&D using best in class technology 
    • Provide a global view of the true customer experience through usage of Real-User Monitoring & external cloud-based solutions
  • Act as a champion for reliability and work with partner teams in different lines of business  to improve resiliency and reliability of all services. Champion our uptime targets and enable other teams  to improve the way we measure the reliability of the system (25%)
  • Facilitate and drive production triage, incident resolution, and retrospective/root cause analysis to maintain the reliability and uptime of our platform (20%)
    • Leverage a strong understanding of Cloud Architecture 
    • Experience developing and operating software on the JVM (Java Virtual Machine) to triage and understand issues within services 
    • Diagnose performance bottlenecks and implement optimizations across infrastructure, database, web, and mobile applications
    • Implement strategies to increase system reliability and performance through on-call rotation and process optimization
    • Lead incident post-mortem/retrospectives to surface reliability improvements and drive to completion

Support and enable the adoption of a platform that enables service resilience testing/chaos engineering to validate and test Toast’s architecture is resilient to failure. Build and own a performance testing framework/environment to enable our R&D teams to understand the constraints of their services and improve  performance (15%)


Do you have the right ingredients*? (Requirements)

  • Extensive and broad industry experience with at least 3-7 years building and running production systems and participating in incident calls
  • Deep understanding of cloud and microservice architecture, and the JVM
  • Comfortable reading, writing, and debugging code
  • Experience with Observability platforms (Datadog, Splunk, New Relic, etc.) -  APM, RUM, Synthetic monitoring
  • Demonstrated experience working with at least one major cloud platform (AWS, GCP, or Azure)
  • Exposure to complex, mission critical, and large scale distributed systems
  • Polyglot technologist/generalist with a thirst for learning

 

Our Spread* of Total Rewards
We strive to provide competitive compensation and benefits programs that help to attract, retain, and motivate the best and brightest people in our industry. Our total rewards package goes beyond great earnings potential and provides the means to a healthy lifestyle with the flexibility to meet Toasters’ changing needs. Learn more about our benefits at https://careers.toasttab.com/toast-benefits.

*Bread puns encouraged but not required


#LI-Remote

The starting pay rate for this role is below. Please note, there is not a range for this role, the number listed below is the rate.
Pay Rate
$131,000—$210,000 USD

 

We are Toasters

Diversity, Equity, and Inclusion is Baked into our Recipe for Success.

At Toast our employees are our secret ingredient. When they are powered to succeed, Toast succeeds.

The restaurant industry is one of the most diverse industries. We embrace and are excited by this diversity, believing that only through authenticity, inclusivity, high standards of respect and trust, and leading with humility will we be able to achieve our goals.

Baking inclusive principles into our company and diversity into our design provides equitable opportunities for all and enhances our ability to be first in class in all aspects of our industry.

Bready* to make a change? Apply today!

Toast is committed to creating an accessible and inclusive hiring process. As part of this commitment, we strive to provide reasonable accommodations for persons with disabilities to enable them to access the hiring process. If you need an accommodation to access the job application or interview process, please contact [email protected].

Apply Now

Date Posted

08/31/2024

Views

1

Back to Job Listings Add To Job List Company Profile View Company Reviews
Positive
Subjectivity Score: 0.8

Similar Jobs

Staff Salesforce Engineer - CRM Systems - GitLab

Views in the last 30 days - 0

This job description outlines a Staff Salesforce Developer role focusing on designing building and scaling enterprisegrade solutions across Salesforce...

View Details

Software Engineer III | Platform - ExtraHop

Views in the last 30 days - 0

This job posting seeks a Software Engineer III to develop features lead junior team members and contribute to secure cloud and appliance solutions The...

View Details

DevOps Engineer - Guidehouse

Views in the last 30 days - 0

This job posting seeks a skilled DevOps Engineer to support development QA and operations across applications emphasizing automation cloudnative infra...

View Details

Senior Marketer - Usage Automation - HubSpot

Views in the last 30 days - 0

This job posting outlines a Senior Marketer role at HubSpot focusing on customer experience through datadriven automation Responsibilities include des...

View Details

Engineering Manager - Software Supply Chain Security: Auth Infrastructure - GitLab

Views in the last 30 days - 0

This job description highlights a leadership role in developing secure scalable authentication infrastructure for GitLab It emphasizes technical exper...

View Details

Growth Product Lead - Loyalty - Trafilea

Views in the last 30 days - 0

Trafilea promotes itself as a transformative consumer tech platform with AIdriven growth solutions highlighting achievements like 1B revenue and globa...

View Details