Senior Software Engineer, Cloud Development

Jobgether · Canada

Company

Jobgether

Location

Canada

Type

Full Time

Job Description

Team: IT

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Senior Software Engineer, Cloud Development based in Canada.

This role sits at the core of a modern AI platform team responsible for building and operating large-scale infrastructure that powers intelligent product experiences. You will design and maintain cloud-native services that support model training, deployment, and high-throughput inference in production environments. The work spans distributed systems, Kubernetes-based orchestration, and GPU-accelerated workloads at global scale. You will contribute to the evolution of reliable, secure, and privacy-conscious AI systems used by millions of users. The environment is highly collaborative, bringing together engineering, product, infrastructure, and security teams. This is a hands-on role for someone who thrives in complex backend systems and cares deeply about performance, scalability, and operational excellence.

Accountabilities:

  • Design, build, and operate scalable platform services and APIs that support production AI and backend workloads.
  • Own service reliability end-to-end, improving availability, latency, scalability, and cost efficiency across distributed systems.
  • Develop and optimize Kubernetes-based infrastructure, including deployment pipelines, environment configuration, and resource management.
  • Improve service lifecycle practices such as packaging, versioning, testing, validation, and automated deployments.
  • Implement observability systems (metrics, logging, tracing, alerting) to strengthen operational visibility and incident response.
  • Collaborate with cross-functional teams to deliver secure, scalable, and privacy-respecting platform capabilities.
  • Participate in architectural discussions, operational processes, on-call rotations, and incident postmortems while mentoring peers.

  • Requirements:

    • Bachelor’s degree with 4–6+ years of relevant experience, or equivalent hands-on production systems experience.
    • Strong Python development skills with experience building maintainable services, libraries, and CLIs.
    • Proven experience running production workloads in cloud environments (GCP preferred) and managing infrastructure at scale.
    • Deep knowledge of Kubernetes and Helm, including multi-environment deployments and progressive rollouts.
    • Experience with infrastructure-as-code tools such as Terraform for provisioning and managing cloud resources.
    • Strong understanding of distributed systems, API design, and production-grade service reliability.
    • Familiarity with observability tools (e.g., Grafana) and debugging performance or reliability issues in complex systems.
    • Excellent communication skills and experience collaborating across engineering, product, and infrastructure teams.
    • On-call and incident response experience in production environments.
    • Bonus: experience with GPU workloads, Ray/Ray Serve, ML infrastructure, or multi-provider LLM systems.

    • Benefits:

      • Competitive performance-based bonus program with shared success model.
      • Comprehensive medical, dental, and vision coverage.
      • Strong retirement contributions with immediate 100% vesting.
      • Quarterly company-wide wellness days and additional paid holidays.
      • Home office stipend and annual professional development budget.
      • Quarterly well-being allowance for personal wellness needs.
      • Generous parental leave policies.
      • Employee referral bonuses and additional country-specific benefits (life insurance, disability coverage, EAP, etc.).
Apply Now

Date Posted

06/29/2026

Views

0

Back to Job Listings Add To Job List Company Profile View Company Reviews
Neutral
Subjectivity Score: 0
142,000+ Jobs Tracked
12,400+ Companies
1,930 Categories