Job Description
The role
As a Site Reliability Engineer (SRE) at company, your mandate is to ensure the availability and reliability of our most critical services, and ensure that they meet the requirements of our customers. Our SRE team is growing, so you’ll be a crucial early member to help establish the team, processes, and best practices. Success in this role looks like collaborating with other teams to build and run sustainable production systems that can evolve and adapt to the changes in our fast-paced environment.
This role is responsible for:
-
Working proactively with engineering teams to help them set SLOs and implement best practices for logging and telemetry collection
-
Design, implement and maintain the tools and systems that support service reliability, monitoring, and alerting
-
Participating in a 24x7 on-call rotation supporting the health of our services
-
Driving the incident management process and support a blameless post-mortem culture
-
Participating in application design consulting and capacity planning
-
Defining and formalizing SRE practices and help guide the overall reliability engineering direction
-
Providing mentorship both formally and informally to engineers
-
Continuously optimizing systems and workflows by improving architecture, infrastructure, automation, CI/CD, and observability
-
Combining software and systems knowledge to engineer high-volume distributed systems in a reliable, scalable, and fault-tolerant manner
You bring
-
5+ years of relevant industry experience with a focus on distributed cloud native systems design, observability, operation, maintenance, and troubleshooting
-
5+ years operational experience with an observability platform like Datadog, Splunk, Prometheus/Grafana, or AppDynamics
-
Fluency in one or more programming languages (e.g. Python, Typescript, Go)
-
A strong conviction in software development best practices, including version control, automated testing, and continuous integration and delivery
-
You're self-motivated, inquisitive, and always looking to learn new technologies
-
You’re a great teammate who communicates clearly and transparently
-
The Triple H Factor: Humble, Hungry and Honest
-
An act-like-an-owner mentality. We have a bias toward taking action.
Date Posted
04/04/2024
Views
2
Similar Jobs
Staff Flight Test Engineer - Wisk
Views in the last 30 days - 0
Wisk Aero is seeking a Staff Flight Test Engineer to join their team in Hollister CA The role involves ensuring safe and efficient flight testing and ...
View DetailsSenior Developer, Data Engineer - Tarana Wireless, Inc.
Views in the last 30 days - 0
Tarana is seeking a Senior DeveloperData Engineer with 5 years of experience in building largescale data pipelines The role involves designing buildin...
View DetailsStaff Engineer, System Design Verification Engineering - Western Digital
Views in the last 30 days - 0
Western Digital is seeking a validation engineer to define and track test plans characterize and optimize SSDs and lead bug review meetings The ideal ...
View DetailsServo Development Engineer - Western Digital
Views in the last 30 days - 0
Western Digital a company with over 50 years of experience in data storage is seeking a skilled professional to optimize highperformance and robust po...
View DetailsSenior Front-End Software Engineer - Percipient.ai
Views in the last 30 days - 0
Percipientai founded in 2017 is a cuttingedge technology company specializing in Computer Vision Artificial Intelligence and Deep Learning They develo...
View DetailsPrincipal Software Engineer (Prisma Access) - Palo Alto Networks
Views in the last 30 days - 0
Palo Alto Networks is a cybersecurity company committed to protecting the digital way of life They are seeking a Principal Software Engineer to build ...
View Details