Job Description
Who we are
About Stripe
Stripe is a financial infrastructure platform for businesses. Millions of companies—from the world’s largest enterprises to the most ambitious startups—use Stripe to accept payments grow their revenue and accelerate new business opportunities. Our mission is to increase the GDP of the internet and we have a staggering amount of work ahead. That means you have an unprecedented opportunity to put the global economy within everyone’s reach while doing the most important work of your career.
About the team
The Incident Ops team is a global 24/7 team responsible for driving incident response and management from detection to resolution. Stripe is proud of its five 9s reliability and this team is at the forefront of ensuring we keep it that way - working hand-in-hand with Reliability Eng and across the Tech Org. This team of incident response managers (IRM) is defined by our sense of ownership and how we drive incidents to resolution - marshaling the necessary cross-functional resources to respond to and resolve service outages critical bugs security attacks and anything that significantly impacts the users of our products. The team is user-first and ensures appropriate external communications from Stripe and senior management to keep our users informed of disruption to their experience of Stripe. The team is skilled in communications incident handling and technical adeptness as incidents can arise from anywhere and cut across products and orgs in Stripe.
What you’ll do
As an Incident Response Manager (IRM) you’ll play the key role in driving the right level of response from Stripes to incidents determining impact rallying Stripes to mitigate communicating to users and ensuring appropriate remediations and orchestrate the Root Cause Analysis (RCA) process. You’ll work hand-in-hand with IRMs and engineers globally to ensure solid 24/7 coverage on how we monitor detect respond communicate and mitigate incidents. When not managing incidents you'll help scale our ability to respond to incidents improve our operations analyze data to provide insights and deepen our technical expertise in products. As a result you’ll be seen as the protector of our users - in minimizing the impact of incidents on their business and ensuring that Stripe is always thinking of our users.
Responsibilities
-
Act as an on-call Incident Commander responsible for driving and managing incident resolution with a high level of urgency cross-functional collaboration and accuracy while partnering with a global and diverse set of teams including Engineering Product Policy Risks PR Legal Execs etc.
-
Lead all user-facing incidents across domains at Stripe - including reliability technical security and data privacy
-
'User First' approach to determine impact providing accurate situation reports facilitating comms bridges and ensuring useful and timely external communications to users
-
Proactively update internal stakeholders make decisions through data and influence by partnering with Engineering Sales Support and other cross-functional teams
-
Contribute to the root cause analysis process while conducting post-mortems remediations identification and ensure problem management tasks meet SLA and user expectations
-
Drive improvements in the incident handling process and incident management metrics and tooling based on trends and data of Stripe's incidents in collaboration with engineering product and operations teams
-
Contribute to processes projects forums or groups that impact and grow team culture positively
Who you are
We’re looking for someone who meets the minimum requirements to be considered for the role. If you meet these requirements you are encouraged to apply. The preferred qualifications are a bonus not a requirement.
Minimum requirements
-
4+ years of demonstrable major incident experience for organizations that run mission critical applications or always-on Saas environments.
-
Demonstrated ability to lead multiple incidents concurrently with authority and influence responders with agency and reasoning skills to resolve ambiguous problems and drive to root cause.
-
Intermediate understanding of application development application architectures and applications deployed in cloud environments
-
Good understanding of infrastructure including physical virtual and container-based compute platforms
-
Demonstrated quantitative and analytical skills in data manipulation using SQL Splunk or other tools.
-
Excellent task management skills must be detail-oriented with ability to remain composed methodical and think fast in a high-pressured environment.
-
Exceptional written and verbal English communication skills with the ability to translate complex technical issues for internal and external stakeholders.
Preferred qualifications
-
Domain expertise in classes of incidents such as technical privacy security or crisis with a strong desire to continuously learn about Stripe's products technical issues and systems.
-
Ability to review complex technical details regarding ongoing issues/events and convey the key details to senior stakeholders to facilitate real-time decision making.
-
Experience with broad user-facing communications (e.g. status pages tweets) and/or targeted communications (e.g. direct emails support ticket responses).
-
Familiarity operating or managing distributed architectures with the ability to correlate system behaviors based on known inter-dependencies.
-
Demonstrated understanding of full stack development and support
Date Posted
09/19/2024
Views
0
Similar Jobs
Engineering Manager - Software Supply Chain Security: Auth Infrastructure - GitLab
Views in the last 30 days - 0
This job description highlights a leadership role in developing secure scalable authentication infrastructure for GitLab It emphasizes technical exper...
View DetailsAccount Manager - Trafilea
Views in the last 30 days - 0
Shapermint is a leading DTC shapewear brand known for its comfortfirst designs AIdriven growth strategies and global operations The company emphasizes...
View DetailsManager Safety Regional Operations - Airbnb
Views in the last 30 days - 0
This job description outlines a Manager Regional Operations T3 role at Airbnb requiring expertise in traumainformed care team management and operation...
View DetailsAnalyst Relations Manager - Tanium
Views in the last 30 days - 0
Tanium seeks a proactive Manager for Analyst Relations focusing on project management and evaluation coordination The role offers remote flexibility a...
View DetailsStaff Salesforce Engineer - CRM Systems - GitLab
Views in the last 30 days - 0
This job description outlines a Staff Salesforce Developer role focusing on designing building and scaling enterprisegrade solutions across Salesforce...
View DetailsGrowth Product Lead - Loyalty - Trafilea
Views in the last 30 days - 0
Trafilea promotes itself as a transformative consumer tech platform with AIdriven growth solutions highlighting achievements like 1B revenue and globa...
View Details