Lead Site Reliability Engineer

hims & hers · USA

Company

hims & hers

Location

USA

Type

Full Time

Job Description

About the Role:

We are seeking a Lead Site Reliability Engineer to help build a reliable web experience for our users. We believe that moving fast is our competitive advantage and enables us to better serve our users. We also know that the faster we move the more likely we are to break things.

You Will:

  • Design and implement SRE practices ensuring availability scalability and observability of production systems with a strong focus on excellent customer experience

  • Actively seek and identify opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation

  • Use automation extensively to design configure manage and monitor systems in support of our product development teams

  • Understanding of Infrastructure and infra automation (Infrastructure as Code)

  • Manage incidents and emergency response track outages ensure data integrity and engineer releases to promote safe efficient and rapid deployments

  • Handle emergency response either by being on-call or by reacting to symptoms according to monitoring and escalation when needed

  • Improve the codebase by resolving logic issues deprecating unused code etc.

  • Implement monitoring logging alerting and SLO Reporting

  • Identify Service Level Indicators (SLIs) that will align the team to meet the availability and performance objectives

  • Perform and run blameless RCAs on incidents and outages aggressively looking for answers that will prevent incident reoccurrence

  • Provides reviews on design documents from internal and external teams

  • Performs more-complex tasks using highly-specialized knowledge and advanced business experience

  • Resolves complex tickets in creative manners

  • Develops and leads large and highly-complex cross-functional projects or programs

  • Determines solutions to blockers identify tasks and developing solutions as appropriate

  • Responsible for at least for 1 major delivery domain and accountable for all the aspects of SRE for that domain

  • Develops standards tools and knowledge requirements for skill and career development

You Have:

  • 10+ years as a software engineer shipping production code

  • 5+ years of experience as a Site Reliability Engineer or Production support Engineer

  • Bachelor's degree in Computer Science Engineering or related field or relevant years of work experience

  • Experience with service-oriented architectures and microservices at scale

  • Strong proficiency with RDBMS databases (PostgreSQL MySQL SQL Server etc.)

  • Strong proficiency in SQL scripting

  • Proficiency developing in one or more languages such as Java Kotlin Python and/or others

  • Ability to use containers and orchestration frameworks (Kubernetes Docker Container registries etc.)

  • Knowledge of CDN typescript frameworks and GQL.

  • Knowledge and good understanding of any pub/sub / Queue messaging systems

  • Proficiency in Git or other VCS

  • Experience with configuring customizing and extending monitoring tools (Datadog Prometheus New Relic etc.)

  • Excellent debugging and troubleshooting skills

  • Strong technical competency with a data-driven analytical approach towards solving complex challenges

  • Have a systematic problem-solving approach coupled with strong and effective communication skills and a sense of drive

    • Nice-to-have: Experience with Terraform or other IAC tools such as Chef Puppet or Ansible

Our Benefits (there are more but here are some highlights):

  • Competitive salary & equity compensation for full-time roles

  • Unlimited PTO company holidays and quarterly mental health days

  • Comprehensive health benefits including medical dental & vision and parental leave

  • Employee Stock Purchase Program (ESPP)

  • Employee discounts on hims & hers & Apostrophe online products

  • 401k benefits with employer matching contribution

  • Offsite team retreats

#LI-Remote

Apply Now

Date Posted

08/14/2024

Views

1

Back to Job Listings Add To Job List Company Profile View Company Reviews
Positive
Subjectivity Score: 0.8

Similar Jobs

Staff Salesforce Engineer - CRM Systems - GitLab

Views in the last 30 days - 0

This job description outlines a Staff Salesforce Developer role focusing on designing building and scaling enterprisegrade solutions across Salesforce...

View Details

Growth Product Lead - Loyalty - Trafilea

Views in the last 30 days - 0

Trafilea promotes itself as a transformative consumer tech platform with AIdriven growth solutions highlighting achievements like 1B revenue and globa...

View Details

Team Lead - Publisher Success Management (AdTech) - MGID

Views in the last 30 days - 0

MGID is a fastgrowing digital advertising company seeking a resultsdriven Team Lead to oversee client relationships and drive business growth in the U...

View Details

Software Engineer III | Platform - ExtraHop

Views in the last 30 days - 0

This job posting seeks a Software Engineer III to develop features lead junior team members and contribute to secure cloud and appliance solutions The...

View Details

DevOps Engineer - Guidehouse

Views in the last 30 days - 0

This job posting seeks a skilled DevOps Engineer to support development QA and operations across applications emphasizing automation cloudnative infra...

View Details

Engineering Manager - Software Supply Chain Security: Auth Infrastructure - GitLab

Views in the last 30 days - 0

This job description highlights a leadership role in developing secure scalable authentication infrastructure for GitLab It emphasizes technical exper...

View Details