Job Description
About the Role:
- Responsible for SRE team arrangement and project management, guiding basic SRE work to be more effective, and improving the overall SRE efficiency.
- Drive the design and engineering of tools, as well as platform solutions, to optimize product engineering and operation efficiencies.
- Manage on call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize downtime. Â
- Influence and motivate teams across a diverse set of vertical domains and geographic locations to ensure customer and merchant incidents are addressed rapidly and efficiently so that our software applications are available and functional 24x7x365.
- Work with senior management in the event of issue escalation.
- Provide clear communication to executives and key stakeholders regarding the business impact, risks, prioritization, mitigation, and estimated time-to-fix for these incidents on a timely basis.
- Ensure appropriate monitoring is in place for reliable operations of all applications and initiate corrective action plans when appropriate.
- Document incident information in the incident management system and ensure data is accurate, and complete. Doing so will help you identify incident and data trends (including gaps and inaccuracies) through the normal course of incident management (post mortem).
- Provide informal incident process and requirement training to cross-functional teams, as needed, to support consistent incident management execution.
- Collaborate with Service Owners to define the SLOs and build SLIs to ensures systems are meeting the SLAs
- Responsible for training team members and putting process & procedure in place to support the system and to handle the critical incidents.
- Coordinate appropriate resources to resolve critical incidents in accordance with service level agreements and operational level agreements.
- Own all communication during a major system outage, ensuring IT management and the businesses are kept updated until the incident is resolved.
- With thorough understanding of technology assets/environments/services, business needs and SLAs/SLOs, lead the creation, revision and implementation of monitoring tools, processes and uptime reports.Â
Key Outcomes:
- Build and invest in relationships with key partners while learning the business and supporting model
- Implement AIOps machine learning solutions to automate the detection, consolidation, and remediation of alerts, events, and metrics in our platforms.
- Modernize processes to enable automation for change control, runbooks, documentation publishing, and monitoring solutions.
- Drive adoption of unified processes for Monitoring, Alerting, Incident Response and cross-product visibility as the enterprise product portfolios evolve.
Skills, Experiences and Education:
- B.S. in Electrical or Computer Engineering, Computer Science or relevant work experience
- 7+Â years of experience in large complex information systems, and/or Cloud environments
- 7 years of experience in an engineering centric workflow environment.
- Broad experience in troubleshooting large-scale distributed systems covering application, cloud, OS, networking, and storage areas
- Self-motivated and proactive, with demonstrated creative and critical thinking capabilities
- A clear communicator, compassionate leader who loves SRE
- Canada: Province of Ontario
D, E & I Mission & Culture at Agero:
We are all Change Drivers at Agero. Each day, we speak to thousands of drivers and tow professionals across one of the most diverse countries in the world. Our mission to safeguard drivers on the road, strengthen our clients’ relationships with their drivers, and support the communities we live and work in unites us together as one force driving positive change.
The road to positive change starts inside Agero. In celebrating each other’s differences, we lift each other up and create space for innovation and community. Bringing our whole selves to work powers our commitment, drive, agility, and courage - ensuring we are not only changing the landscape of the driver services industry, we also are making a difference in the lives of our customers with each call, chat, and rescue.
THIS DESCRIPTION IS NOT INTENDED TO BE A COMPLETE STATEMENT OF JOB CONTENT, RATHER TO ACT AS A GUIDE TO THE ESSENTIAL FUNCTIONS PERFORMED. MANAGEMENT RETAINS THE DISCRETION TO ADD TO OR CHANGE THE DUTIES OF THE POSITION AT ANY TIME.
To review Agero's privacy policy click the link: https://www.agero.com/privacy.
Date Posted
01/27/2023
Views
0
Similar Jobs
Senior Design Manager (Infrastructure) - Canonical
Views in the last 30 days - 0
Canonical a leading opensource provider seeks a Senior Design Manager to drive innovation in cloud and AI technologies The role offers remote work glo...
View DetailsProduct Manager Wallet SDKs - Startale
Views in the last 30 days - 0
The text describes a job alert system where applicants must mention UNSELFISH and use a specific tag to demonstrate they read the post It explains the...
View DetailsSenior Product Designer - Org & Security - Typeform
Views in the last 30 days - 0
This job description outlines a role in developing an intelligent contact management system with AI capabilities The position involves designing user ...
View DetailsExecutive Director Patient Advocacy - Kyverna Therapeutics
Views in the last 30 days - 0
Kyverna Therapeutics is seeking an Executive Director for Patient Advocacy to lead initiatives in autoimmune disease treatment The role involves build...
View DetailsMedical Affairs Writer Contract - Kyverna Therapeutics
Views in the last 30 days - 0
Kyverna Therapeutics seeks a Medical Affairs Writer to develop scientific publications and communications for cell therapy innovations The role requir...
View DetailsRecovery Analyst Underpayments - Trend Health Partners
Views in the last 30 days - 0
TREND Health Partners seeks an Underpayment Recovery Analyst to optimize client reimbursement through collaboration and detailed claim analysis The ro...
View Details