Job Description
We are seeking a Lead Site Reliability Engineer (Infrastructure) to join our fast-moving VSaaS engineering organization. This role carries responsibility for technical leadership and operational execution of the Infrastructure SRE team. You will own the reliability scalability and operability of our shared platform and production systems while shaping how reliability engineering and SRE practices are applied across the organization and mentoring senior and staff engineers.
You will work closely with product engineering and platform teams to ensure a seamless developer experience while setting standards driving priorities and leading by example during incidents and high-impact operational work. This role requires a strong technical background in cloud infrastructure distributed systems CI/CD and GitOps along with hands-on development experience in Golang and/or Python to improve developer workflows automation and long-term system reliability.
This is a remote role in the United States.
Role Overview
Site Reliability Engineer - Infrastructure
The Infrastructure team provides leadership direction and accountability for platform architecture system design and end-to-end implementation to meet and exceed product non-functional requirements including quality security reliability availability and performance. Site Reliability Engineers enable Product Development teams to ship features with reliable velocity by owning the stability scalability and operability of the underlying infrastructure and shared services.
What You Will Do:
As a Lead Site Reliability Engineer you will:
- Operate and evolve large-scale distributed systems anticipating failure modes and proactively mitigating risks across production environments while owning day-to-day production operations including monitoring alert triage incident response post-incident analysis and critical incident coordination and documentation.
- Lead the design build and implementation of automation orchestration and operational tooling to improve efficiency reliability signal-to-noise ratio and reduce recurring issues minimizing service-impacting events.
- Set technical direction and influence platform strategy by defining platform architecture system design and documentation to guide development testing deployment and long-term maintenance of complex distributed systems.
- Establish and enforce standards operational rigor and best practices for deploying monitoring managing and operating cloud-native and distributed infrastructure environments.
- Lead the adoption and execution of modern CI/CD GitOps and cloud-native infrastructure practices ensuring reliable scalable and traceable software and infrastructure releases.
- Mentor and develop senior and staff engineers reinforcing SRE principles DevOps practices accountability and operational excellence across the Infrastructure SRE team.
- Collaborate closely with product and engineering stakeholders advocating for an SRE mindset and system-level thinking to maximize reliability performance availability security and scalability across shared platforms and services.
Other duties as assigned are absorbed into the above ownership and operational responsibilities.
What You Have:
- 10+ years of experience in site reliability engineering infrastructure or systems engineering with deep ownership of large-scale production systems and demonstrated leadership of SRE or infrastructure teams including setting technical direction and mentoring senior engineers.
- Strong hands-on experience designing and building automation and operational tooling using Golang and/or Python with expert-level proficiency in Linux/Unix systems shell scripting and production troubleshooting.
- Advanced expertise in cloud-native and IaaS architectures distributed systems and container orchestration in production environments including compliance security and network considerations.
- Expertise in architecting modular Terraform frameworks and Infrastructure-as-code (IaC) design patterns.
- Deep understanding of SRE and DevOps principles including incident management SLA/SLO ownership automation reliability engineering practices and leading incident response with post-incident analysis and preventive improvements.
- Strong experience with CI/CD pipelines GitOps workflows release tooling and modern cloud-native infrastructure practices ensuring reliable and traceable software and infrastructure changes.
- Hands-on experience operating Docker and Kubernetes environments observability platforms (logging monitoring alerting) and SQL/NoSQL databases (e.g. Postgres MongoDB Graph DB) including performance tuning and operational troubleshooting.
Skills / Training Desired
- Subject matter expertise in Google Cloud preferred; experience with other public cloud providers is also valuable.
- Demonstrated expertise in microservices lifecycle management including integration testing deployment and operational best practices supported by advanced knowledge of software release tooling and CI/CD platforms such as GitLab Jenkins Cloud Build ArgoCD and Spinnaker.
- Deep understanding of the Docker and Kubernetes ecosystem including orchestration cluster management and image lifecycle optimization.
- Strong experience with observability logging and monitoring tools such as ELK Stack Prometheus Stackdriver Datadog New Relic or Dynatrace.
- Hands-on experience with algorithms data structures complexity analysis and software/system design for large-scale distributed environments.
- Experience driving automation for operational efficiency signal noise reduction recurring issue mitigation performance testing capacity planning and system optimization in production environments.
- Experience implementing security best practices and compliance considerations in infrastructure and platform design along with the ability to influence cross-functional teams evangelize SRE and DevOps practices and foster a culture of reliability and operational excellence.
Why Milestone?
Milestone offers not only great benefits but also great culture. Employees here have flexible work environments opportunities for further education and the ability to effect change in our Organization directly.
The annual salary for this position ranges from $160000 to $180000 range. Pay is based on the level location complexity responsibility and job duties of the specific position and is just one component of Milestone's total compensation package. Additionally we offer an attractive benefits package that includes medical/dental benefits FSA or HSA 401k with 6% Safe Harbor employer match paid parental leave generous PTO (20 days' vacation 10 days paid sick time and 12 company holidays) fully paid Short Term disability policy fully paid Long Term disability policy and Life Insurance. If you are selected for an interview please feel welcome to speak to our Talent Partner about our compensation philosophy.
All employees must complete a background check. Employees in fiscal roles are also required to undergo a credit check. All information obtained during these checks is handled confidentially and shared only with authorized personnel.
Milestone is committed to creating a diverse and inclusive workplace and is proud to be an equal opportunity employer.
Contact and application
Please apply at our website: www.milestonesys.com
We are looking forward to receiving your application
Top Skills
What the Team is Saying

What We Do
At Milestone System we are dedicated to making the world see. As a leading provider of data-driven video technology software we empower people businesses and societies with innovative solutions that enhance security efficiency and insight.
Why Work With Us
We’re proud to foster a working environment that supports well-being and growth opportunities. At the organizational level we celebrate our team members and value their personal expertise. Everyone has access to personal growth programs and health initiatives along with the freedom to govern their work-life balance.
Gallery
Milestone Systems Offices
Hybrid Workspace
Employees engage in a combination of remote and on-site work.
If you live within a reasonable distance of our Lake Oswego OR office this will be hybrid with 3 days in the office.
Similar Jobs
Milestone Systems
Channel Business Manager - NJ&DE
Milestone Systems
National Partner Manager - NE
Milestone Systems
Sales Executive
Milestone Systems
Solutions Engineer
Explore More
Date Posted
04/08/2026
Views
0