Senior Site Reliability Engineer

· Remote

Location

Remote

Type

Full Time

Job Description

Senior Site Reliability Engineer

Posted 10 Hours Ago
Be an Early Applicant
San Francisco CA USA
Hybrid
167K-226K Annually
Senior level
Security • Software • Cybersecurity • Automation
Drata is on a mission to help build trust across the internet.
The Role
As a Senior Site Reliability Engineer you will enhance the reliability of Drata’s product teams through automation architecture reviews and operational excellence using cloud-native technologies.
Summary Generated by Built In

Our Mission & Values:
At Drata we help companies earn and keep the trust of their users customers partners and prospects. We’re the proof layer that shows great companies deserve the trust they aim to build.

We live our values every day. Built on Trust means consistency is everything. Act with Integrity by always doing the right thing. Being Customer-Obsessed keeps the people we serve at the center of our work. Competitive Fire drives us to push ourselves harder than anyone else. Diversity brings unique perspectives that lead to better solutions. Automation First ensures we save time and money by making efficiency a priority.

Our Culture & Work Style 🚀

At Drata we’re not just building software - we’re building a mindset. Everything we do springs from:

  • Be a Driver (Owner‑Operator Mentality): Own your work. Improve relentlessly. Deliver results.

  • Move at Drata Speed (Precision & Velocity): Fast decisions. Quick learning. Immediate impact.

  • Stay Mission-Driven (Customer‑Obsessed): Challenge assumptions. Deliver value. Stay hungry.

We pair that high-velocity culture with a thoughtful hybrid model because we believe flexibility and collaboration both matter. That’s why in the Bay we come together in-office Tuesday through Thursday our high‑impact collaboration days where teams align strategize and innovate. Mondays and Fridays are flexible giving you space for focused work balance and autonomy.

If you thrive when you’re empowered energized and working with smart mission-driven people where you’ll feel at home here.

Why Join The Drata Team?

The best way to understand the Driver’s Mindset is to see it in action. We’re an award-winning mission-driven team of 600+ people worldwide united by a culture that values trust speed and continuous growth.

  • See the Speed: Watch our CEO Adam Markowitz discuss the hyper-growth journey from $0 to $100M ARR in just four years

  • Hear the Voice of the Team: Explore our "Life at Drata" page for employee testimonials on our collaborative and the growth opportunities available.

  • Experience the Impact: See why we are consistently recognized on Fortune's Best Workplaces lists.

  • Connect with Us on Socials: LinkedIn - follow us for company updates employee stories and career news.

Job Summary:

Drata's SRE team operates as both a central engineering function and an embedded reliability practice. You'll be part of a close-knit SRE team where you grow your career shape standards and collaborate with peers - while also serving as the dedicated reliability partner for one of Drata's product engineering teams across the full lifecycle of their work.

This is a highly technical role at the intersection of software engineering and systems engineering. The best SREs at Drata are engineers first: they solve problems by building solutions not by executing manual processes. Automation is a core value and nowhere is that more visible than in how we approach reliability.

Our infrastructure runs on AWS across multiple accounts defined entirely in Terraform. You'll work across a modern cloud-native stack to help Drata scale reliably for a rapidly growing customer base.

What you’ll do:

Reliability Architecture for Your Product Team

You are the reliability expert for your aligned product team. You engage early - during architecture reviews and design discussions - to surface risks before they become incidents.

  • Lead Production Readiness Reviews (PRRs) before new services launch with the authority to flag gaps and gate launches when critical reliability standards aren't met

  • Partner with product engineering leads and staff engineers to define SLOs and SLIs for critical services turning reliability from a vague goal into a measurable commitment

  • Participate in team planning and architecture reviews to provide proactive reliability guidance

  • Build reusable artifacts - SLO templates observability checklists alerting standards reference dashboards - that raise the reliability floor across the team not just the services you touch directly

Eliminating Toil Through Engineering

You handle operational needs from your product team but your job isn't to be a help desk. Your goal is to make each request the last of its kind. When an engineer needs something your priority is: automate it so anyone can do it → document it so the team can self-serve → execute it manually only as a last resort.

  • Build and maintain Datadog monitors dashboards and alert routing - enforcing infrastructure-as-code standards via Terraform so those resources are owned versioned and auditable

  • Handle infrastructure requests: ECS task management secret rotations Terraform changes capacity adjustments

  • Identify repeated manual work and convert it into self-service tooling or runbooks

  • Audit existing services for reliability anti-patterns and surface top risks before they cause incidents

Central SRE Platform Work

Beyond your product team you contribute to cross-cutting infrastructure tooling and standards that benefit every team at Drata. Recent examples include automated Datadog governance workflows dynamic AWS account provisioning and disaster recovery exercises.

  • Design and build shared platform infrastructure - reusable Terraform modules standardized observability stacks service templates - so reliability improvements compound across the organization

  • Participate in the on-call rotation and lead incident response when needed; conduct thorough post-incident reviews to drive lasting fixes

  • Design and manage CI/CD pipelines using GitHub Actions

  • Contribute to evolving SRE standards tooling and practices across the organization

What you'll bring:

  • 6+ years of experience in Site Reliability Engineering Cloud Engineering or building and maintaining scalable resilient services

  • Robust knowledge of cloud computing technologies: Terraform Docker Git and Linux

  • Hands-on experience with Datadog for monitoring alerting dashboards SLO tracking and distributed tracing

  • Experience building software systems as a software engineer

  • Experience developing tooling and automation in Python and/or Bash

  • Experience with CI/CD pipeline automation specifically GitHub Actions

  • Experience with disaster recovery practices and incident management

  • Strong understanding of observability concepts - monitoring logging distributed tracing and metrics - and how to apply them to production systems

  • Experience with container orchestration and deployment technologies including AWS ECS Fargate and/or Kubernetes

  • Experience working with relational databases (MySQL proficiency is a plus)

  • Ability to take ownership of problems and act on them independently in a constantly evolving environment

Nice to Have:

  • Experience with AIOps - using AI/ML-based tooling for anomaly detection predictive alerting or automated incident triage

  • Familiarity with the reliability characteristics of AI/ML-backed services (e.g. LLM inference latency non-determinism prompt pipeline observability)

  • Experience with the JavaScript/Node.js ecosystem

  • Certified Kubernetes Administrator (CKA) certification

  • Familiarity with compliance frameworks like SOC 2 ISO 27001 or NIST

AI Experience (required - at least one of the following):

  • Hands-on experience using AI-assisted development tools (e.g. GitHub Copilot Cursor or similar) to accelerate automation scripting or infrastructure work

  • Demonstrated use of AI/AIOps capabilities for reliability tasks - anomaly detection incident triage runbook generation or alert noise reduction

  • Familiarity with the operational characteristics of AI/ML-backed services and what it means to make them observable and reliable in production

  • Demonstrated passion for AI through personal projects contributions or continuous learning in the context of infrastructure or reliability engineering

How we support you:
At Drata our people are our strongest advantage—and we prove it with support that exceeds industry standards. Our total rewards package is designed to power your well-being accelerate your growth and keep your work-life balance thriving.

Explore how we invest in your Life at Drata.

  • Shared Success: We provide stock equity to ensure that as the company grows you share directly in that success. Equity gives every employee a sense of ownership and the opportunity to celebrate our wins together—because your contributions don’t just support our progress; they help drive our collective success.

  • Health & Wellness: Up to 100% employer-paid premiums for medical dental and vision coverage for employees and their dependents along with comprehensive wellness benefits and healthcare concierge services designed to support your needs beyond traditional insurance.

  • Financial Well-being: A comprehensive suite of financial benefits including a 401(k) plan company-paid life and disability insurance tax-advantaged spending accounts and a range of discounted voluntary offerings to help you customize and strengthen your overall financial position.

  • Family Support: We want to support you in life's most important moments so we offer a paid Parental Leave policy after six months of employment. Employees also receive access to Kindbody fertility and family-building benefits and dedicated leave specialists who help guide you through the entire process.

  • Growth & Development: Generous annual stipends for both professional and personal development empowering you to invest in your continued growth. You’ll also have access to a wide range of internal learning opportunities ensuring you can build new skills deepen your expertise and advance your career with confidence.

  • Time Off & Flexibility: We believe that to do your best work you should get the time you need for rest rejuvenation and recovery. Drata offers a flexible vacation policy paid holidays and other perks to recharge.

This role will receive a competitive base salary benefits and stock typically in the form of Restricted Stock Units (RSUs). The applicable salary range for this role is: $166900 - $225900.

A variety of factors are considered when determining someone’s leveling and compensation–including a candidate’s professional background and experience. These ranges may be modified in the future and final offer amounts may vary from the amounts listed above.

Am I A Good Fit?
beta
Expert contributor network
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Sydney
600 Employees
Year Founded: 2020

What We Do

Trust Automated. Drata automates your compliance journey from start to audit-read and beyond and provides support from the security and compliance experts who built it. The company is backed by ICONIQ Growth Alkeon Capital Salesforce Ventures GGV Capital Cowboy Ventures Leaders Fund Okta Ventures SVCI SV Angel and many key industry leaders.

Why Work With Us

With a powerful mission our people help to build a unique and diverse culture. Drata supports continued professional development promotional paths and every opportunity to move fast and reach their full potential. Join our driven team and help build trust across the internet!

Gallery

Similar Jobs

Applied Systems

Senior Site Reliability Engineer

Cloud • Insurance • Payments • Software • Business Intelligence • App development • Big Data Analytics
Remote or Hybrid
2 Locations
3040 Employees
65K-160K Annually

Block

Senior Site Reliability Engineer

Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
In-Office or Remote
8 Locations
12000 Employees
161K-284K Annually

ServiceNow

Site Reliability Engineer

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
Santa Clara CA USA
28000 Employees
166K-290K Annually

Crexi

Senior Site Reliability Engineer

Real Estate • Sales • Software • PropTech
Easy Apply
Hybrid
Los Angeles CA USA
400 Employees
160K-214K Annually

Similar Companies Hiring

Artificial Intelligence • Other • Security • Software • Analytics • Big Data Analytics
Lake Oswego OR
1500 Employees
Hardware • Other • Robotics • Sales • Software • Hospitality
New York NY
30 Employees
Fintech • Software
New York New York
6 Employees
Apply Now

Date Posted

04/28/2026

Views

0

Back to Job Listings Add To Job List Company Profile View Company Reviews
Neutral
Subjectivity Score: 0
142,000+ Jobs Tracked
12,400+ Companies
1,930 Categories