Principal, Site Reliability Engineer
Job Description
Description
Company: Oak Street Health
Title: Principal, Site Reliability Engineer
Location: Chicago
Role Description:
The SRE plays both a proactive and reactive role as it relates to production. The proactive role includes partnering with development and infrastructure teams to ensure the design and tooling supports operability once it reaches the production environment. Once in production the applications telemetry is closely analyzed to determine if the application is meeting its SLO / SLA's and working with development or infrastructure teams to address any gaps in its ability to do so. The reactive portion of the role includes level 3 support for any issues that were not detected prior to production to add additional telemetry, monitoring or alerting as well as automation to prevent reoccurrence.
Core Responsibilities:
What are we looking for?
What does being "Oaky" look like?
Why Oak Street Health?
Oak Street Health is on a mission to "Rebuild healthcare as it should be'', providing personalized primary care for older adults on Medicare, with the goal of keeping patients healthy and living life to the fullest. Our innovative care model is centered right in our patient's communities, and focused on the quality of care over volume of services. We're an organization on the move! With over 150 locations and an ambitious growth trajectory, Oak Street Health is attracting and cultivating team members who embody "Oaky" values and passion for our mission.
Oak Street Health Benefits:
Oak Street Health is an equal opportunity employer. We embrace diversity and encourage all interested readers to apply.
Learn more at www.oakstreethealth.com/diversity-equity-and-inclusion-at-oak-street-health
Company: Oak Street Health
Title: Principal, Site Reliability Engineer
Location: Chicago
Role Description:
The SRE plays both a proactive and reactive role as it relates to production. The proactive role includes partnering with development and infrastructure teams to ensure the design and tooling supports operability once it reaches the production environment. Once in production the applications telemetry is closely analyzed to determine if the application is meeting its SLO / SLA's and working with development or infrastructure teams to address any gaps in its ability to do so. The reactive portion of the role includes level 3 support for any issues that were not detected prior to production to add additional telemetry, monitoring or alerting as well as automation to prevent reoccurrence.
Core Responsibilities:
- Ensure the application has telemetry so we can monitor and alert on issues to prevent customer impacts.
- Partner with application and infrastructure teams in proper capacity planning.
- Ability to partner with application, infrastructure and product teams to establish non functional requirements to achieve SLO/SLA.
- Partner with application and infrastructure teams to help introduce and automate self healing capabilities tested in lower environments and executed in production..
- Introduce production readiness checks and criteria within the pipeline for non functional requirements i.e does logging correlate , performance test, resiliency test, monitoring and alerting works.
- Introduce contracts and measures to establish error budgets with delivery teams to determine whether additional enhancements can be introduced based on achievement of SLO/SLA.
- Participate in design sessions to ensure applications design is capable of achieving desired non functional requirements i.e. 99.99 uptime.
- Educate operations and support teams on available tooling and capabilities to support production.
What are we looking for?
- 5+ years of development experience in Java or Python
- Experience working as level 3 support on a development team that is very familiar with support issues. .
- 5+ years of experience with networks, servers, cloud computing, databases and applications to view holistically the operational aspects of an applications availability
- Ability to collaborate with development and infrastructure teams on stability solutions.
- Documentation skills to create instructions or workflows to allow teams to support the production environment.
- Communication skills to educate support teams on tooling needed for the production environment.
- 5+ years of automation experience for self healing or non functional validation testing.
What does being "Oaky" look like?
- Radiating positive energy
- Assuming good intentions
- Creating an unmatched patient experience
- Driving clinical excellence
- Taking ownership and delivering results
- Being relentlessly determined
Why Oak Street Health?
Oak Street Health is on a mission to "Rebuild healthcare as it should be'', providing personalized primary care for older adults on Medicare, with the goal of keeping patients healthy and living life to the fullest. Our innovative care model is centered right in our patient's communities, and focused on the quality of care over volume of services. We're an organization on the move! With over 150 locations and an ambitious growth trajectory, Oak Street Health is attracting and cultivating team members who embody "Oaky" values and passion for our mission.
Oak Street Health Benefits:
- Mission-focused career impacting change and measurably improving health outcomes for medicare patients
- Paid vacation, sick time, and investment/retirement 401K match options
- Health insurance, vision, and dental benefits
- Opportunities for leadership development and continuing education stipends
- New centers and flexible work environments
- Opportunities for high levels of responsibility and rapid advancement
Oak Street Health is an equal opportunity employer. We embrace diversity and encourage all interested readers to apply.
Learn more at www.oakstreethealth.com/diversity-equity-and-inclusion-at-oak-street-health
Apply Now
Back to Job Listings
Add To Job List
Company Profile
View Company Reviews
Date Posted
02/24/2023
Views
1
Positive
Subjectivity Score: 0.9
Similar Jobs
Site Reliability Engineer - AWS - CCC Intelligent Solutions
Views in the last 30 days - 0
View DetailsLead Software Engineer (Test Data Management, Global Payment Network) - Capital One
Views in the last 30 days - 0
View DetailsLead Software Engineer - Test Automation - JPMorgan Chase
Views in the last 30 days - 0
View DetailsAutomation Engineer Intern - Parts Summer 2026 - CCC Intelligent Solutions
Views in the last 30 days - 0
View Details