Senior Site Reliability Engineer at Jobgether

Team: IT

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer in Canada.

This role sits at the core of a fast-scaling, AI-driven intelligence platform, where reliability is not just operational support but a strategic enabler of product innovation. You will design and own the foundations that ensure large-scale, mission-critical systems remain observable, resilient, and performant under demanding AI and data workloads. Acting as a senior individual contributor, you will shape reliability standards, SLO frameworks, and multi-region architecture while directly influencing engineering decisions across the organization. The environment is highly technical, collaborative, and innovation-focused, with a strong emphasis on AI-native systems and automation-first thinking. You will work across software, AI engineering, and platform teams to ensure seamless delivery of complex services. This is a hands-on leadership role for someone who wants to define how modern AI infrastructure operates at scale.

Accountabilities

You will define and own service reliability standards, including SLOs, SLIs, and error budgets, ensuring consistent performance across all production systems.
You will design and implement reliability patterns for AI agent pipelines, including observability, failure detection, and safe degradation mechanisms.
You will architect and improve multi-region infrastructure strategies, driving high availability, disaster recovery readiness, and blast radius control.
You will lead incident response and postmortem processes, ensuring durable fixes and continuous improvement of system resilience.
You will serve as the primary reliability partner for engineering and AI teams, influencing architecture, deployment strategies, and system design decisions.
You will own observability and platform tooling, including service catalog management, Datadog configuration, and AI workload monitoring.
You will develop CI/CD standards and enable self-service developer platforms to improve deployment velocity and system reliability.
You will contribute to FinOps initiatives by improving cost visibility and optimizing infrastructure efficiency across cloud environments.

Requirements

You bring 6–8+ years of experience in Site Reliability Engineering, DevOps, or platform engineering, with senior-level technical ownership responsibilities.
You have deep expertise in AWS and distributed systems architecture, including multi-region, high-availability environments.
You are highly skilled in Kubernetes, Docker, Terraform, and GitOps practices, with strong infrastructure-as-code experience.
You have hands-on experience with observability platforms such as Datadog, including SLO monitoring, alerting, tracing, and log analytics.
You are proficient in scripting and development (Python and/or Bash), with solid understanding of microservices architectures.
You have strong experience designing and optimizing CI/CD pipelines (e.g., GitHub Actions, Bitbucket Pipelines).
You understand reliability challenges in large-scale systems and can translate complex technical risks into actionable engineering solutions.
You have strong communication and collaboration skills, with the ability to influence cross-functional teams and mentor engineers.
Experience with AI/ML infrastructure, LLM systems, or agent-based architectures is a strong advantage.

Benefits

Competitive compensation in the range of $125,200 – $132,500 CAD.
Comprehensive benefits package including health, dental, vision, and wellness coverage.
RRSP matching and annual fitness reimbursement.
Flexible vacation policy and remote-first work arrangement within Canada.
Access to professional training, development programs, and high-growth career opportunities.
Wellness resources and employee support programs.
Inclusive, diverse, and accessibility-focused work environment.
Opportunities to work on cutting-edge AI and large-scale data infrastructure systems.

Senior Site Reliability Engineer

Company

Location

Type

Job Description

Accountabilities

Requirements

Benefits

Explore More

Date Posted

Views

Similar Jobs

Senior Analytics Engineer - Reddit

Senior Software Engineer - Core Trading - Jobgether

Analytics Engineer - Reddit

Senior Account Executive | Canada Ministry of National Defence - Elastic

Software Engineer, Compiler - Jobgether

Senior Technical Product Manager, AI Engineering & Systems - Jobgether