Job Description
About Zscaler
Zscaler accelerates digital transformation to ensure our customers can be more agile efficient resilient and secure. As an AI-forward enterprise we are constantly pushing the envelope leveraging the world’s largest security data lake to power our cloud-native Zero Trust Exchange platform. This innovation protects our customers from cyberattacks and data loss by securely connecting users devices and applications in any location.
Here impact in your role matters more than title and trust is built on results. We say impact over activity. We seek innovators who actively use AI to amplify their impact and who thrive in an environment where we leverage intelligent systems to stay ahead of evolving threats. We believe in transparency and value constructive honest debate—we’re focused on getting to the best ideas faster. We build high-performing teams that can make an impact quickly and with high quality. To do this we are building a culture of execution centered on customer obsession collaboration ownership and accountability.
We value high-impact high-accountability with a sense of urgency where you’re enabled to do your best work and embrace your potential. If you’re driven by purpose thrive on solving complex challenges and want to be part of the team that’s helping to secure the AI age we invite you to bring your talents to Zscaler and help shape the future of cybersecurity.
Role
We are looking for a Sr. Staff Production Engineer to join our team. This role is available as a hybrid opportunity 3 days a week in San Jose CA or as a remote position reporting to Production Engineering in the Cloud Infrastructure & Operations department. Join Zscaler to be a force multiplier for the reliability of a global platform protecting over 15 million users.
In this role you will provide the technical vision and hands-on execution to drive an "automation-first" culture across the company. By maturing our observability and architectural standards you will directly reduce our Mean Time to Mitigate (MTTM) and shape the scalability of our globally distributed multi-cloud infrastructure.
What you’ll do (Role Expectations)
- Design and implement highly available scalable infrastructure across AWS Azure GCP and bare-metal environments
- Drive an "automation-first" culture by writing code (Python/Go) to eliminate manual toil and build self-healing systems
- Implement and maintain sophisticated observability (Prometheus Grafana OpenTelemetry) define SLIs/SLOs and establish error budgets
- Act as a lead Incident Commander (TDO on-call) develop response playbooks and conduct deep-dive post-incident analyses
- Partner with Engineering and partner teams to conduct operability reviews
Who You Are (Success Profile)
- You act like an owner with a bias for action and integrity.
- You are a pragmatic builder obsessed with creating iterating and shipping.
- You champion simplicity by distilling complex problems into clear actionable plans.
- You are data-driven valuing evidence over assumptions.
- You think at scale building solutions and processes built to last a high-growth global organization.
What We’re Looking for (Minimum Qualifications)
- 8+ years of experience managing reliability scalability and availability for large-scale production services
- Deep expertise in programming (e.g. Python Go or C/C++)
- Strong background in networking protocols Linux/FreeBSD systems and distributed architecture
- Experience in high-stakes incident management and participation in a 24/7 on-call rotation
- Proficiency in leveraging ITIL frameworks and incident data to drive service maturity through systematic problem management and technical operability reviews
What Will Make You Stand Out (Preferred Qualifications)
- Extensive experience with public cloud (AWS Azure GCP) and Infrastructure-as-Code (Ansible Terraform)
- Experience with chaos engineering and disaster recovery planning at scale
- Expertise in global routing (BGP) and traffic tunneling (GRE IPSec) with a deep understanding of L7 proxy architectures (HAProxy) DNS at scale and OS networking stack internals
#LI-Hybrid
#LI-CM3
Zscaler’s salary ranges are benchmarked and are determined by role and level. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position across all US locations and could be higher or lower based on a multitude of factors including job-related skills experience and relevant education or training.
The base salary range listed for this full-time position excludes commission/ bonus/ equity (if applicable) + benefits.
At Zscaler we are committed to building a team that reflects the communities we serve and the customers we work with. We foster an inclusive environment that values all backgrounds and perspectives emphasizing collaboration and belonging. Join us in our mission to make doing business seamless and secure.
Our Benefits program is one of the most important ways we support our employees. Zscaler proudly offers comprehensive and inclusive benefits to meet the diverse needs of our employees and their families throughout their life stages including:
- Various health plans
- Time off plans for vacation and sick time
- Parental leave options
- Retirement options
- Education reimbursement
- In-office perks and more!
Learn more about Zscaler’s Future of Work strategy hybrid working model and benefits here.
By applying for this role you adhere to applicable laws regulations and Zscaler policies including those related to security and privacy standards and guidelines.
Zscaler is committed to providing equal employment opportunities to all individuals. We strive to create a workplace where employees are treated with respect and have the chance to succeed. All qualified applicants will be considered for employment without regard to race color religion sex (including pregnancy or related medical conditions) age national origin sexual orientation gender identity or expression genetic information disability status protected veteran status or any other characteristic protected by federal state or local laws. See more information by clicking on the Know Your Rights: Workplace Discrimination is Illegal link.
Pay Transparency
Zscaler complies with all applicable federal state and local pay transparency rules.
Zscaler is committed to providing reasonable support (called accommodations or adjustments) in our recruiting processes for candidates who are differently abled have long term conditions mental health conditions or sincerely held religious beliefs or who are neurodivergent or require pregnancy-related support.
Skills Required
- 8+ years of experience managing reliability scalability and availability for large-scale production services
- Deep expertise in programming (e.g. Python Go or C/C++)
- Strong background in networking protocols Linux/FreeBSD systems and distributed architecture
- Experience in high-stakes incident management and participation in a 24/7 on-call rotation
- Proficiency in leveraging ITIL frameworks and incident data to drive service maturity
What the Team is Saying
What We Do
Zscaler accelerates digital transformation so our customers can be more agile efficient resilient and secure. Our cloud native Zero Trust Exchange platform protects thousands of customers from cyberattacks and data loss by securely connecting users devices and applications in any location.
Why Work With Us
Our impact comes from how we work—every day in every decision as one team. Our values and leadership principles are more than words; they're our operating system. They fuel a culture of execution where trust transparency and accountability help us deliver meaningful results for our customers colleagues and our company. www.zscaler.com/culture
Gallery
Zscaler Offices
Hybrid Workspace
Employees engage in a combination of remote and on-site work.
Similar Jobs
Zscaler
Field CISO - Healthcare West
Zscaler
Insider Risk Analyst - SkillBridge Intern
Explore More
Date Posted
05/30/2026
Views
0