Site Reliability Engineer Technical Lead
Job Description
Are you the one? We're seeking an experienced Site Reliability Engineer to lead and mentor our SRE team. You're a seasoned professional with a proven track record in designing and implementing robust SRE processes at scale. You excel in cloud and hybrid environments have a deep understanding of containerization and are passionate about creating resilient high-performance systems that can handle extreme traffic peaks. Beyond technical expertise you're a skilled communicator and collaborator able to bridge the gap between technical teams and stakeholders. You thrive in cross-functional environments and can effectively represent SRE concerns at the leadership level.
Responsibilities:
-
Lead the implementation and refinement of SRE practices across the organization including SLOs error budgets and blameless postmortems
-
Design and implement automation to eliminate toil and improve system reliability and efficiency
-
Lead initiatives and architect scalable hybrid cloud solutions for Web3 infrastructure
-
Manage error budgets and make data-driven decisions about when to prioritize reliability vs. new features
-
Drive SRE practices to ensure high availability performance and reliability under varying load conditions
-
Collaborate closely with Platform engineering team to build reliability into services from the ground up
-
Collaborate closely with Nethermindβs Infrastructure Leadership department to align SRE strategies with overall technical vision
-
Drive the adoption of observability best practices and implement comprehensive monitoring systems
-
Develop and maintain service level indicators (SLIs) and objectives (SLOs) working with product owners to define appropriate reliability targets
-
Mentor team members in SRE practices and foster a culture of continuous learning
-
Lead capacity planning efforts using quantitative analysis to predict and address future scaling challenges
-
Contribute to long-term technical roadmaps balancing reliability concerns with product innovation
Skills:
-
5+ years of experience in Site Reliability Engineering or DevOps
-
Expert knowledge of cloud platforms (AWS GCP)
-
Expert knowledge of Kubernetes
-
Proven experience in designing and implementing scalable efficient resilient systems
-
Deep understanding of Linux/Unix systems and networking protocols
-
Strong programming skills in Python or Go
-
Strong background in monitoring observability and logging systems (e.g. Grafana Prometheus Loki)
-
Expertise in CI/CD tools (e.g. GitHub Actions ArgoCD)
-
Excellent communication skills both written and verbal with the ability to explain complex technical concepts to various audiences
-
Experience in producing technical documentation runbooks presentations and post-mortem reports
-
Experience and passion for mentoring and upskilling team members
Nice to have:
-
Experience leading technical teams
-
Contributions to open-source projects or thought leadership in SRE
-
Familiarity with MLOps and big data technologies
-
Knowledge of blockchain technology and infrastructure
-
Experience with chaos engineering principles and tools
-
Familiarity with traffic management and CDN technologies
-
Systems or backend engineering background
Date Posted
09/14/2024
Views
0
Similar Jobs
Senior Full Stack Engineer - Swissblock
Views in the last 30 days - 0
Swissblock seeks a Full Stack Software Engineer to develop innovative financial tools The role involves creating userfriendly interfaces and improving...
View DetailsSenior AI Full-Stack Software Engineer - Skedda
Views in the last 30 days - 0
Skedda is seeking a senior AIfocused fullstack developer to contribute to innovative workplace management solutions The role offers competitive compen...
View DetailsSenior Go-to-Market (RevOps) Engineer - Skedda
Views in the last 30 days - 0
Skedda offers a competitive salary flexible work and a collaborative environment The role involves software development and innovation with a focus on...
View DetailsSenior Platform Engineer - Infrastructure - Kalepa
Views in the last 30 days - 0
This job description highlights a senior engineering role with a competitive salary range of 85k155k equity options and benefits like PTO gym reimburs...
View DetailsSenior Support Engineer - n8n
Views in the last 30 days - 0
n8n is a rapidly growing AI platform with a strong community and impressive achievements They offer competitive roles and a positive work culture emph...
View DetailsStaff Backend Engineer - PHP + Go - Hostaway
Views in the last 30 days - 0
Hostaway offers a remote backend engineer role in Europe with competitive pay equity and a dynamic team culture The position involves integrating with...
View Details