Lead Site Reliability Engineer - Infrastructure

· Remote

Location

Remote

Type

Full Time

Job Description

Lead Site Reliability Engineer - Infrastructure

Reposted 3 Hours Ago
Hiring Remotely in United States
Remote or Hybrid
160K-180K Annually
Senior level
Artificial Intelligence • Other • Security • Software • Analytics • Big Data Analytics
Learn from the past. Understand the present. Predict the future.
The Role
The Lead Site Reliability Engineer will oversee the Infrastructure SRE team focusing on system reliability automation and mentoring while collaborating with product engineering.
Summary Generated by Built In
JOB DESCRIPTION
We are seeking a Lead Site Reliability Engineer (Infrastructure) to join our fast-moving VSaaS engineering organization. This role carries responsibility for technical leadership and operational execution of the Infrastructure SRE team. You will own the reliability scalability and operability of our shared platform and production systems while shaping how reliability engineering and SRE practices are applied across the organization and mentoring senior and staff engineers.
You will work closely with product engineering and platform teams to ensure a seamless developer experience while setting standards driving priorities and leading by example during incidents and high-impact operational work. This role requires a strong technical background in cloud infrastructure distributed systems CI/CD and GitOps along with hands-on development experience in Golang and/or Python to improve developer workflows automation and long-term system reliability.
This is a remote role in the United States.
Role Overview
Site Reliability Engineer - Infrastructure
The Infrastructure team provides leadership direction and accountability for platform architecture system design and end-to-end implementation to meet and exceed product non-functional requirements including quality security reliability availability and performance. Site Reliability Engineers enable Product Development teams to ship features with reliable velocity by owning the stability scalability and operability of the underlying infrastructure and shared services.
What You Will Do:
As a Lead Site Reliability Engineer you will:
  • Operate and evolve large-scale distributed systems anticipating failure modes and proactively mitigating risks across production environments while owning day-to-day production operations including monitoring alert triage incident response post-incident analysis and critical incident coordination and documentation.

  • Lead the design build and implementation of automation orchestration and operational tooling to improve efficiency reliability signal-to-noise ratio and reduce recurring issues minimizing service-impacting events.

  • Set technical direction and influence platform strategy by defining platform architecture system design and documentation to guide development testing deployment and long-term maintenance of complex distributed systems.

  • Establish and enforce standards operational rigor and best practices for deploying monitoring managing and operating cloud-native and distributed infrastructure environments.

  • Lead the adoption and execution of modern CI/CD GitOps and cloud-native infrastructure practices ensuring reliable scalable and traceable software and infrastructure releases.

  • Mentor and develop senior and staff engineers reinforcing SRE principles DevOps practices accountability and operational excellence across the Infrastructure SRE team.

  • Collaborate closely with product and engineering stakeholders advocating for an SRE mindset and system-level thinking to maximize reliability performance availability security and scalability across shared platforms and services.

Other duties as assigned are absorbed into the above ownership and operational responsibilities.
What You Have:
  • 10+ years of experience in site reliability engineering infrastructure or systems engineering with deep ownership of large-scale production systems and demonstrated leadership of SRE or infrastructure teams including setting technical direction and mentoring senior engineers.

  • Strong hands-on experience designing and building automation and operational tooling using Golang and/or Python with expert-level proficiency in Linux/Unix systems shell scripting and production troubleshooting.

  • Advanced expertise in cloud-native and IaaS architectures distributed systems and container orchestration in production environments including compliance security and network considerations.

  • Expertise in architecting modular Terraform frameworks and Infrastructure-as-code (IaC) design patterns.

  • Deep understanding of SRE and DevOps principles including incident management SLA/SLO ownership automation reliability engineering practices and leading incident response with post-incident analysis and preventive improvements.

  • Strong experience with CI/CD pipelines GitOps workflows release tooling and modern cloud-native infrastructure practices ensuring reliable and traceable software and infrastructure changes.

  • Hands-on experience operating Docker and Kubernetes environments observability platforms (logging monitoring alerting) and SQL/NoSQL databases (e.g. Postgres MongoDB Graph DB) including performance tuning and operational troubleshooting.

Skills / Training Desired
  • Subject matter expertise in Google Cloud preferred; experience with other public cloud providers is also valuable.

  • Demonstrated expertise in microservices lifecycle management including integration testing deployment and operational best practices supported by advanced knowledge of software release tooling and CI/CD platforms such as GitLab Jenkins Cloud Build ArgoCD and Spinnaker.

  • Deep understanding of the Docker and Kubernetes ecosystem including orchestration cluster management and image lifecycle optimization.

  • Strong experience with observability logging and monitoring tools such as ELK Stack Prometheus Stackdriver Datadog New Relic or Dynatrace.

  • Hands-on experience with algorithms data structures complexity analysis and software/system design for large-scale distributed environments.

  • Experience driving automation for operational efficiency signal noise reduction recurring issue mitigation performance testing capacity planning and system optimization in production environments.

  • Experience implementing security best practices and compliance considerations in infrastructure and platform design along with the ability to influence cross-functional teams evangelize SRE and DevOps practices and foster a culture of reliability and operational excellence.

Why Milestone?
Milestone offers not only great benefits but also great culture. Employees here have flexible work environments opportunities for further education and the ability to effect change in our Organization directly.
The annual salary for this position ranges from $160000 to $180000 range. Pay is based on the level location complexity responsibility and job duties of the specific position and is just one component of Milestone's total compensation package. Additionally we offer an attractive benefits package that includes medical/dental benefits FSA or HSA 401k with 6% Safe Harbor employer match paid parental leave generous PTO (20 days' vacation 10 days paid sick time and 12 company holidays) fully paid Short Term disability policy fully paid Long Term disability policy and Life Insurance. If you are selected for an interview please feel welcome to speak to our Talent Partner about our compensation philosophy.
All employees must complete a background check. Employees in fiscal roles are also required to undergo a credit check. All information obtained during these checks is handled confidentially and shared only with authorized personnel.
Milestone is committed to creating a diverse and inclusive workplace and is proud to be an equal opportunity employer.
Contact and application
Please apply at our website: www.milestonesys.com
We are looking forward to receiving your application

Top Skills

Ci/Cd
Datadog
Docker
Elk Stack
Gitops
Go
Kubernetes
Linux/Unix
New Relic
NoSQL
Prometheus
Python
SQL
Stackdriver
Terraform

What the Team is Saying

Dylan
Am I A Good Fit?
beta
Expert contributor network
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Lake Oswego OR
1500 Employees
Year Founded: 1998

What We Do

At Milestone System we are dedicated to making the world see. As a leading provider of data-driven video technology software we empower people businesses and societies with innovative solutions that enhance security efficiency and insight.

Why Work With Us

We’re proud to foster a working environment that supports well-being and growth opportunities. At the organizational level we celebrate our team members and value their personal expertise. Everyone has access to personal growth programs and health initiatives along with the freedom to govern their work-life balance.

Gallery

Milestone Systems Offices

Hybrid Workspace

Employees engage in a combination of remote and on-site work.

If you live within a reasonable distance of our Lake Oswego OR office this will be hybrid with 3 days in the office.

Typical time on-site: 3 days a week
Lake Oswego OR

Similar Jobs

Milestone Systems

Channel Business Manager - NJ&DE

Artificial Intelligence • Other • Security • Software • Analytics • Big Data Analytics
Remote or Hybrid
United States
1500 Employees
130K-170K Annually

Milestone Systems

National Partner Manager - NE

Artificial Intelligence • Other • Security • Software • Analytics • Big Data Analytics
Remote or Hybrid
United States
1500 Employees
155K-170K Annually

Milestone Systems

Sales Executive

Artificial Intelligence • Other • Security • Software • Analytics • Big Data Analytics
Remote or Hybrid
United States
1500 Employees
155K-170K Annually

Milestone Systems

Solutions Engineer

Artificial Intelligence • Other • Security • Software • Analytics • Big Data Analytics
Remote or Hybrid
United States
1500 Employees
125K-140K Annually
Apply Now

Date Posted

04/08/2026

Views

0

Back to Job Listings Add To Job List Company Profile View Company Reviews
Neutral
Subjectivity Score: 0
142,000+ Jobs Tracked
12,400+ Companies
1,930 Categories