Principal Site Reliability Engineer

Jobgether · Canada

Company

Jobgether

Location

Canada

Type

Full Time

Job Description

Team: IT

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Principal Site Reliability Engineer in Canada.

This role offers the opportunity to own and enhance the reliability, scalability, and security of a complex cloud infrastructure supporting mission-critical workloads. You will work hands-on across multi-region AWS/EKS environments, partnering with engineering leads, ML and simulation teams, and customer-facing teams to drive operational excellence. This position requires deep technical expertise, strong problem-solving skills, and the ability to take end-to-end ownership of large-scale infrastructure projects. You will lead incident response, implement automated remediation, and guide cloud architecture decisions while optimizing performance, security, and cost. The role is ideal for someone who thrives in a fast-paced, high-autonomy environment and enjoys shaping infrastructure strategies that directly impact customer success.

Accountabilities:

  • Own and evolve cloud infrastructure, ensuring high availability, reliability, and scalability across AWS and EKS environments
  • Manage Kubernetes clusters, including node pool strategy, AMI lifecycle, autoscaling, and workload health monitoring
  • Lead incident response, root cause analysis, and implement systemic fixes to reduce MTTR
  • Oversee cloud security and access management, including IAM governance and compliance readiness
  • Collaborate with cross-functional teams to drive infrastructure design, cost optimization, and next-generation deployment strategies
  • Support CI/CD pipelines, GitOps workflows, and developer experience to enable efficient deployment and troubleshooting
  • Manage networking, VPC design, DNS, load balancing, and cross-region connectivity to support enterprise workloads
  • Provide guidance on infrastructure automation, observability, and monitoring systems
  • Requirements:

    • 5+ years in SRE, DevOps, or infrastructure engineering roles
    • Strong AWS experience (EKS, EC2, IAM, S3, VPC, CloudFront, KMS) and Kubernetes expertise (cluster operations, node pools, RBAC, Helm, autoscaling)
    • Infrastructure-as-code proficiency, preferably with Terraform, including state management and multi-environment patterns
    • Experience with GitOps, CI/CD pipelines (ArgoCD, GitHub Actions, Jenkins), monitoring, and observability tools (Prometheus, Grafana, Elasticsearch)
    • Solid networking fundamentals, including CIDR design, security groups, DNS, VPN, load balancing, and cross-region connectivity
    • Proficiency in Python and Bash for automation and tooling; familiarity with Linux and Windows environments
    • Strong ownership, problem-solving, and critical thinking skills; ability to prioritize and execute in a high-impact environment
    • Excellent communication skills to collaborate with engineering, product, and customer teams
    • Benefits:

      • Competitive salary and total compensation package
      • Flexible remote work within Canada
      • Health and wellness support, including medical and dental coverage
      • Generous paid time off and parental leave programs
      • Professional growth opportunities through mentorship, learning, and challenging projects
      • Exposure to cutting-edge cloud infrastructure, SRE practices, and high-performance computing workloads
Apply Now

Date Posted

04/08/2026

Views

0

Back to Job Listings Add To Job List Company Profile View Company Reviews
Neutral
Subjectivity Score: 0

© 2026 Job Transparency. All rights reserved.