Staff Software Engineer - Reliability (US Citizen Only)

· Remote

Location

Remote

Type

Full Time

Job Description

RubrikJobs
Staff Software Engineer - Reliability (US Citizen Only)

Staff Software Engineer - Reliability (US Citizen Only)

Reposted 12 Hours Ago
Be an Early Applicant
Palo Alto CA USA
In-Office
218K-328K Annually
Senior level
Artificial Intelligence • Big Data • Cloud • Information Technology • Software • Cybersecurity • Data Privacy
Rubrik is on a mission to secure and accelerate the world's AI transformation.
The Role
The Staff Site Reliability Engineer will lead reliability efforts across distributed cloud systems focusing on architecture automation incident management and team leadership ensuring operational excellence in SaaS environments.
Summary Generated by Built In
About Team & About Role

The Site Reliability Engineering (SRE) team at Rubrik ensures the absolute reliability availability performance and security of our enterprise infrastructure services spanning both global SaaS platforms and government-compliant environments. We operate at the intersection of software development and systems engineering prioritizing hyperscale platform automation self-healing architectures and structural resiliency. As a Staff Site Reliability Engineer you will serve as a primary technical leader and architect across our broader distributed cloud systems. You will drive long-term technical roadmaps establish cross-organizational reliability standards and solve complex distributed systems challenges that safeguard both enterprise and public sector environments. 

Beyond the core SRE charter this Staff role also leads the Application-SRE team — a US-based group that partners closely with engineering Sales and Support to unblock POCs drive complex customer escalations to resolution and convert recurring field signals into engineering and reliability roadmap items. You will be the technical leader and project owner for Application-SRE: setting direction tracking commitments and ensuring the team operates as a high-leverage bridge between the field and the broader engineering org.


What You'll Do

As a Staff Site Reliability Engineer you will possess engineering-wide influence and take ownership of the following critical areas:

  • Infrastructure Strategy & Architecture: Formulate and execute the architectural vision for Rubrik's Cloud Platform optimizing backend infrastructure systems like Kubernetes MySQL and cloud-native services for performance security and multi-region scale.
  • Hyperscale Automation & Platform Tooling: Build scale and maintain sophisticated custom internal tools platform controllers and automation frameworks in Go or Python to systematically eliminate operational toil.
  • AI Infrastructure for SaaS: Deploy scale and operate the AI infrastructure that powers Rubrik's SaaS offerings owning the reliability performance cost and security controls required to run AI workloads in multi-tenant compliance-bound environments.
  • AI for SRE & Engineering Productivity: Drive the adoption of AI-driven solutions across the SRE charter to compress toil and multiply the org - applying agentic and LLM-based approaches to automated triage incident response operational analysis and developer productivity.
  • AI Adoption Guardrails for SaaS Reliability: Build the guardrails controls and platform patterns that keep Rubrik's SaaS reliable as AI adoption accelerates across product and engineering ensuring new AI capabilities ship without eroding availability performance security or cost posture.
  • Cross-Functional Leadership: Wield engineering-wide influence to create technical consensus among component platform and security engineering teams effectively "shifting left" to embed structural resilience capacity guards and compliance from initial feature designs.
  • Reliability Governance: Define audit and enforce robust Service Level Indicators (SLIs) Service Level Objectives (SLOs) and Error Budgets across all critical enterprise platform services translating telemetry insights into actionable product roadmaps during executive reviews.
  • Incident Command & Operations Review: Serve as a primary Incident Commander for high-severity cloud outages establishing roles directing mitigation vectors under pressure and orchestrating comprehensive blameless post-mortems that drive durable systemic fixes.
  • Cost Governance & Capacity Modeling: Architect cost-observability tools and attribution frameworks leading cloud infrastructure capacity forecasting resource quota optimization and vendor SLA management.
  • Application-SRE Leadership: Set the technical direction for the Application-SRE team raising the bar on how the team diagnoses mitigates and durably resolves the most complex customer-impacting issues across our platform.
  • Technical Multiplier & Mentorship: Champion SRE best practices mentoring senior and junior individual contributors across the organization participating in interview frameworks and actively raising the collective technical bar.
  • On-Call Rotations: Participate in on-call rotations
Experience You'll Need
  • Citizenship & Residency: Must be a US Citizen currently residing on CONUS soil (strict regulatory requirement to enable support for federal and FedRAMP environments when required).
  • Education: BS MS or PhD in Computer Science Computer Engineering or a highly related technical discipline.
  • Industry Experience: A minimum of 8–12+ years of software engineering and production cloud infrastructure experience with at least 5+ years dedicated to a formal SRE DevOps or Platform engineering role operating hyperscale SaaS products.
  • Technical Depth: Comprehensive hands-on programming expertise in Golang Python or Java with a deep grasp of concurrency models data structures and test-driven software design patterns.
  • Distributed Systems Expertise: Proven proficiency designing deploying analyzing and auditing complex large-scale distributed systems database topologies and high-availability public cloud meshes.
  • Systems Internals: Authoritative operational command of Unix/Linux operating system environments (process models file systems kernels) systems administration and advanced L4/L7 networking protocols.
  • AI Systems Fluency: Working knowledge of operating AI systems in production — including model serving cost trade-offs and the reliability and safety considerations of LLM- and agent-based workloads. Practical judgment on when AI is the right tool versus deterministic automation.
  • Field-to-Product Feedback Loop: Institutionalize the channel that converts patterns from customer escalations and POCs into prioritized product and reliability feedback partnering directly with Product Sales Engineering and Support leadership.
  • Customer & Field Fluency: Track record of partnering directly with Sales Support and customers on escalations and POCs and translating field signals into engineering action.
  • Leadership Capability: Demonstrated history of technical leadership mapping architectural dependencies managing multi-team technical projects and guiding organizations through critical platform shifts with high technical judgment.
Preferred Qualifications
  • Extensive production experience provisioning lifecycle-managing and recovering enterprise-scale Kubernetes (GKE EKS) deployments and large-scale relational/non-relational databases (MySQL).
  • Prior experience building certifying or auditing infrastructure environments under compliance structures such as FedRAMP (High/Moderate) SOC 2 ISO 27001 or CJIS.
  • Fluency in Infrastructure-as-Code (Terraform Pulumi) module design multi-tenant state isolation and enterprise observability fabrics (Prometheus Grafana OpenTelemetry).
  • Exposure to building AI- or LLM-powered internal tooling and applying it to SRE operations or engineering productivity use cases.
  • Familiarity with the operational considerations of running AI workloads on cloud and Kubernetes platforms.
The minimum and maximum base salaries for this role are posted below; additionally the role is eligible for bonus potential equity and benefits. The range displayed reflects the minimum and maximum target for new hire salaries for the role based on U.S. location. Within the range the salary offered will be determined by work location and additional factors including job-related skills experience and relevant education or training.
US Pay Range
$218300$327500 USD
Join Us in Securing and Accelerating the World's AI Transformation

Rubrik (RBRK) the Security and AI Operations Company leads at the intersection of data protection cyber resilience and enterprise AI acceleration. Rubrik Security Cloud delivers complete cyber resilience by securing monitoring and recovering data identities and workloads across clouds. Rubrik Agent Cloud accelerates trusted AI agent deployments at scale by monitoring and auditing agentic actions enforcing real-time guardrails fine-tuning for accuracy and undoing agentic mistakes. 

Linkedin | X (formerly Twitter) | Instagram | Rubrik.com

Inclusion @ Rubrik

At Rubrik we are dedicated to fostering a culture where people from all backgrounds are valued feel they belong and believe they can succeed. Our commitment to inclusion is at the heart of our mission to secure the world’s data.

Our goal is to hire and promote the best talent regardless of background. We continually review our hiring practices to ensure fairness and strive to create an environment where every employee has equal access to opportunities for growth and excellence. We believe in empowering everyone to bring their authentic selves to work and achieve their fullest potential.

Our inclusion strategy focuses on three core areas of our business and culture:
  • Our Company: We are committed to building a merit-based organization that offers equal access to growth and success for all employees globally. Your potential is limitless here.

  • Our Culture: We strive to create an inclusive atmosphere where individuals from all backgrounds feel a strong sense of belonging can thrive and do their best work. Your contributions help us innovate and break boundaries.

  • Our Communities: We are dedicated to expanding our engagement with the communities we operate in creating opportunities for underrepresented talent and driving greater innovation for our clients. Your impact extends beyond Rubrik contributing to safer and stronger communities.

Equal Opportunity Employer/Veterans/Disabled

Rubrik is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race color religion sex sexual orientation gender identity national origin or protected veteran status and will not be discriminated against on the basis of disability.

Rubrik provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race color religion sex national origin age disability or genetics. In addition to federal law requirements Rubrik complies with applicable state and local laws governing nondiscrimination in employment in every location in which the company has facilities. This policy applies to all terms and conditions of employment including recruiting hiring placement promotion termination layoff recall transfer leaves of absence compensation and training. 

Federal law requires employers to provide reasonable accommodation to qualified individuals with disabilities. Please contact us at [email protected] if you require a reasonable accommodation to apply for a job or to perform your job. Examples of reasonable accommodation include making a change to the application process or work procedures providing documents in an alternate format using a sign language interpreter or using specialized equipment.

EEO IS THE LAW

NOTIFICATION OF EMPLOYEE RIGHTS UNDER FEDERAL LABOR LAWS

Skills Required

  • 8-12+ years of software engineering and production cloud infrastructure experience
  • 5+ years in a formal SRE DevOps or Platform engineering role
  • BS MS or PhD in Computer Science or related technical discipline
  • Comprehensive expertise in Golang Python or Java
  • Proficiency in designing large-scale distributed systems
  • Operational command of Unix/Linux environments
  • Track record partnering with Sales Support and customers
  • Technical leadership and managing multi-team technical projects

What the Team is Saying

Isabelle Stepien
Eric Chang
Lewi Abseno
Khushboo Kashyap

Rubrik Compensation & Benefits Highlights

  • Healthcare StrengthMedical coverage includes multiple UMR PPO/HDHP and Kaiser HMO options with 100% preventive care and relatively low employee premiums plus dental and vision. HSA contributions of $900 individual and $1800 family further strengthen value for HDHP enrollees.
  • Parental & Family SupportFamily-forming benefits include support via Carrot/Progyny with up to $25000 lifetime reimbursement for fertility adoption and surrogacy alongside paid parental and family care leave. This breadth supports diverse paths to parenthood and caregiving needs.
  • Wellbeing & Lifestyle BenefitsMental health access via Modern Health provides 10 therapy and 10 coaching sessions at no cost complemented by a $50/month Forma wellness stipend. Additional perks such as commuter benefits legal and pet insurance and hybrid work broaden lifestyle support.

Rubrik Insights

Am I A Good Fit?
beta
Expert contributor network
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Palo Alto CA
3000 Employees
Year Founded: 2014

What We Do

Rubrik (NYSE: RBRK) the Security and AI Operations Company leads at the intersection of data protection cyber resilience and enterprise AI acceleration. Rubrik Security Cloud delivers complete cyber resilience by securing monitoring and recovering data identities and workloads across clouds. Rubrik Agent Cloud accelerates trusted AI agent deployments at scale by monitoring and auditing agentic actions enforcing real-time guardrails fine-tuning for accuracy and undoing agentic mistakes.

Why Work With Us

At Rubrik we believe in the Power of You. You have limitless potential to grow innovate and create meaningful impact. United by our purposeful mission we empower you to boldly pursue your ambitions shape the future of cybersecurity and make your unique mark on what we're building. Join us and unlock your infinite potential.

Gallery

Rubrik Offices

Hybrid Workspace

Employees engage in a combination of remote and on-site work.

Typical time on-site: Not Specified
HQPalo Alto CA
United Arab Emirates
Amsterdam NL
Austin TX
Ballincollig Ballincollig Co.
Bengaluru IN
Bengaluru Karnataka
DKI Jakarta Indonesia
Frankfurt am Main DE
Lawrence KS
London GB
Melbourne VIC
Milano IT
Morrisville NC
Munich DE
New York NY
North Sydney NSW
Paris FR
Reston VA
Riyadh SA
Seattle WA
Solna SE
Tel Aviv-Yafo IL
Vancouver BC
Learn more

Similar Jobs

Rubrik

Product Manager

Artificial Intelligence • Big Data • Cloud • Information Technology • Software • Cybersecurity • Data Privacy
In-Office
Palo Alto CA USA
3000 Employees
193K-290K Annually

Rubrik

Operations Manager

Artificial Intelligence • Big Data • Cloud • Information Technology • Software • Cybersecurity • Data Privacy
In-Office
Palo Alto CA USA
3000 Employees
140K-21M Annually

Rubrik

Software Engineer

Artificial Intelligence • Big Data • Cloud • Information Technology • Software • Cybersecurity • Data Privacy
In-Office
Palo Alto CA USA
3000 Employees
158K-237K Annually

Rubrik

Program Manager

Artificial Intelligence • Big Data • Cloud • Information Technology • Software • Cybersecurity • Data Privacy
In-Office
Palo Alto CA USA
3000 Employees
195K-292K Annually
Apply Now

Date Posted

06/24/2026

Views

0

Back to Job Listings Add To Job List Company Profile View Company Reviews
Neutral
Subjectivity Score: 0
142,000+ Jobs Tracked
12,400+ Companies
1,930 Categories