AI DevOps & Reliability Engineer

Jobgether · Canada

Company

Jobgether

Location

Canada

Type

Full Time

Job Description

Team: IT

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a AI DevOps & Reliability Engineer based in Canada.

This is a high-impact engineering role at the intersection of platform reliability, DevOps automation, and AI-driven operations. You will be responsible for shaping how software is built, deployed, and operated across a large-scale, high-traffic SaaS environment. The role combines central platform ownership with hands-on embedding within engineering teams to improve delivery speed, system resilience, and operational maturity. You will design and evolve CI/CD pipelines, deployment automation, and infrastructure standards that enable safe, continuous releases. A key focus is driving the adoption of AI into DevOps workflows, including incident response, observability, and runbook automation. This is a 0-to-1 environment where you will influence architecture, culture, and engineering practices at scale.

Accountabilities:

In this role, you will own the end-to-end delivery and reliability ecosystem, building platforms and practices that enable fast, safe, and scalable software delivery across engineering teams.

  • Design, build, and evolve CI/CD pipelines, deployment automation, and release frameworks that enable continuous and on-demand production delivery
  • Define and enforce engineering standards for progressive delivery, rollback strategies, quality gates, and deployment safety mechanisms
  • Build and manage self-service environments (dev, staging, and ephemeral) that replicate production and accelerate development cycles
  • Drive AI-augmented DevOps practices, including automated runbooks, intelligent alerting, and AI-assisted incident response workflows
  • Champion Infrastructure as Code and GitOps practices to ensure scalable, repeatable, and secure infrastructure and deployments
  • Own operational reliability practices including observability, incident response, SLO/SLI definition, and on-call readiness
  • Partner directly with engineering teams in an embedded model to improve delivery maturity and operational excellence
  • Track and improve engineering performance using DORA metrics and other reliability indicators
  • Requirements:

    The ideal candidate brings deep DevOps and platform engineering expertise, combined with strong hands-on experience in modern infrastructure and AI-enabled operations.

    • 7+ years of experience in DevOps, platform engineering, SRE, or infrastructure-focused roles in high-scale environments
    • Strong hands-on experience with Kubernetes and AWS in production systems
    • Deep expertise in Infrastructure as Code tools such as Terraform and/or CloudFormation
    • Proven experience designing and operating CI/CD pipelines with strong governance, automation, and quality controls
    • Experience implementing GitOps workflows using tools such as Argo CD or Flux
    • Hands-on experience operating high-scale systems including Kafka and distributed data infrastructure
    • Strong software engineering and automation skills using Python, Bash, or similar languages
    • Experience with observability tooling such as Prometheus, Grafana, PagerDuty, and related monitoring stacks
    • Practical experience with incident management, on-call rotations, and reliability engineering best practices
    • Demonstrated experience integrating AI tools or agentic workflows into DevOps or SRE processes
    • Strong communication skills with the ability to influence, mentor, and collaborate across engineering teams
    • Benefits:

      • Competitive base salary with performance-based annual bonus
      • Equity opportunities for eligible roles
      • Fully remote work within Canada
      • Comprehensive health, dental, and vision coverage
      • Generous paid time off and flexible work arrangements
      • Learning and development support, including courses and training programs
      • Parental leave and family support benefits
      • Opportunity to work on high-impact systems in a fast-scaling engineering environment
      • Strong culture of ownership, autonomy, and technical excellence
Apply Now

Date Posted

07/03/2026

Views

0

Back to Job Listings Add To Job List Company Profile View Company Reviews
Neutral
Subjectivity Score: 0
142,000+ Jobs Tracked
12,400+ Companies
1,930 Categories