Site Reliability Engineer II

Abnormal Security · Canada

Company

Abnormal Security

Location

Canada

Type

Full Time

Job Description

About The Role

Enterprises of all sizes trust Abnormal Security’s cloud products to stop cybercrime. These products must scale with the growth of our customers, and ensure reliability and availability by being resilient. This is where our SRE fits in, ensuring the prevention, detection, efficient remediation, and quick recovery from outages that impact the Abnormal Security Platform.

Come empower the rest of engineering to stop cybercrime as we expand our offerings across both clouds and regions.

There are a lot of opportunities for growth and career advancement – it’s up to you to own your career here. Some potential career paths for this role include:

Positioning yourself to be a founding member of a team that will have an outsized impact on the rest of the company.
Growing into a Senior technical leadership role.

What You Will Do

Deployment Operations

Build tools and processes to standardize deployment of Abnormal Security product suite in a multi-datacenter setup.
Partner with R&D teams to develop pre and post deployment checklists, canary test environments and workflows, and safe rollback processes.

Incident Prevention

Identify gaps in existing processes and advocate for necessary changes to improve overall system stability and availability.
Lead the Production Readiness Review process to ensure the resilience of systems before customer deployment.
Oversee the Critical Change Management Review process for the safe application of changes to critical services.
Develop and enforce architecture guidelines to minimize downtime and ensure high system availability.

Detection

Establish consistent definition of metrics for “Is this product working”.
Define and monitor SLAs/SLOs for critical systems, actively tracking deviations and triggering alerts when necessary.

Remediation

Define incident severity classification guidelines and implement incident response protocols to promptly address issues and reduce downtime.
Facilitate effective communication between Engineering and Customer Success teams during incidents.

Incident Recovery

Design and implement tools to expedite system recovery and minimize the impact of incidents.
Develop guidelines for Post Mortems after incidents to prevent recurrence.

Must Have

Bachelor’s in Computer Science, Computer Engineering, or equivalent professional experience
4+ experience as a Site Reliability Engineer, responsible for the reliability of shared services
Experience with a public cloud provider (AWS, Azure, GCP), observability stack (Prometheus, Grafana), and incident management tools (PagerDuty, Sentry, Slack integration).

Nice To Have

Experience with defining and implementing SRE practices such as Change Management, Production Readiness Review, and Incident Post Mortems.
Experience with container orchestration, preferably Kubernetes and Helm.
Experience developing Infrastructure as Code (IaC) modules and building automation, preferably Terraform.

#LI-NT1

Explore More

stop cybercrime Jobs empower the rest of engineering Jobs grow into a Senior technical leadership role Jobs ensure high system availability Jobs define incident severity classification guidelines Jobs More Jobs at Abnormal Security Jobs in Canada

Apply Now

Date Posted

10/17/2024

Views

Back to Job Listings Add To Job List Company Profile View Company Reviews

Positive

Subjectivity Score: 0.8

Similar Jobs

Junior Full Stack AI Engineer - Mogo Finance Technology Inc.

Views in the last 30 days - 0

This job posting highlights a remote engineering role at Mogo focused on building AInative financial platforms with innovative features The position e...

View Details

Staff Backend Engineer - Grafana Databases Loki Ingest - Grafana Labs

Views in the last 30 days - 0

This remote Staff Backend Engineer role at Grafana involves working on observability platforms contributing to opensource projects and collaborating i...

View Details

Senior Staff Software Engineer - Marketing Technology - Gusto, Inc.

Views in the last 30 days - 0

This job description highlights a Senior Staff Software Engineer role focused on leading the transition of Gustos MarTech stack to an AInative platfor...

View Details

Information Security Engineer (DLP) - Scopely

Views in the last 30 days - 0

Scopely seeks an Information Security Engineer DLP to join their Data Protection team in Canada on a remote basis The role involves safeguarding data ...

View Details

Senior Frontend Software Engineer - Megaport

Views in the last 30 days - 0

Megaport is a leading global NaaS provider with a collaborative and innovative culture They seek a Frontend Developer to join their skilled team offer...

View Details

MSP Sales Representative - Malleum

Views in the last 30 days - 0

The text highlights a cybersecurity consultancy seeking a driven Sales Rep to grow their MSP division It emphasizes building client relationships sell...

View Details