Engineering Team Lead, Site Reliability Engineer
Job Description
Role Description
Givebutter is hiring a New York City-based Site Reliability Team Lead to oversee the reliability, scalability, and performance of our systems. As a Lead SRE, you will be directly responsible for delivering world-class infrastructure to our users, maturing our operational practices, and leading a team of skilled engineers. You will report directly to our CTO and carry out our infrastructure vision while creating a scalable engineering culture that breeds innovation. You will ensure we are delivering excellent user experiences in a timely manner and retain top-notch security, design, and performance. You will cultivate a culture of high performance by creating systems that eliminate roadblocks, processes that incentivize excellence, and by being an expert in site reliability engineering. We have already built a great foundation, powering hundreds of millions of donations to over 10k+ organizations and you will take this impact much further.
Why join the Givebutter Engineering team?
Democracy of code - We are a group of engineers that values equal contribution as well as discussing architecture and ideas openly.
Not overburdened with meetings - Our Engineers manage their own calendars and block times so they can work uninterrupted.
Automated ci/cd - Our builds are reproducible and the pipeline is easy to manage. Shipping to production is hands-off, automated, and consistent. Our engineers are focused on solving problems with code.
Mission-driven, full stop - We work with amazing organizations, non-profits, and charities doing good all over the world.
Responsibilities
- Manage and hire in-house SREs and contractor resources
- Handle and prioritize incidents, ensuring timely resolution and effective communication.
- Establish and manage key metrics for reliability; set up and maintain alerting systems.
- Automate tasks and manage infrastructure using Infrastructure as Code (IaC) tools and techniques.
- Ensure application scalability and identify performance bottlenecks to optimize system performance.
- Design and implement fault-tolerant and highly available systems to minimize downtime.
- Develop, implement, and regularly test disaster recovery plans to ensure business continuity.
- Conduct capacity planning to anticipate and manage future infrastructure needs.
- Define, measure, and maintain SLOs and SLAs to meet service performance expectations.
- Ensure the security of applications through best practices and conduct regular penetration tests to identify and mitigate vulnerabilities.
Requirements
- 5+ years of experience building and deploying production infrastructure at scale
- 5+ years experience working with AWS
- Knowledge of PHP
- Aware of trends and best practices in SRE and cloud infrastructure
- 2+ years of experience managing system architecture, ensuring best practices for reliability, performance, and security
- Strong technical leadership, mentorship, and communication skills
- Experience working for a product-led growth company is beneficial
- Experience managing a remote engineering team
Date Posted
09/14/2024
Views
1
Similar Jobs
Software Engineering Lead - Dotdash Meredith
Views in the last 30 days - 0
Dotdash Meredith is seeking a skilled Engineering Lead for a missioncritical role in designing and scaling their nextgeneration publishing platform Th...
View DetailsSenior Software Engineer, Devices Automation - Block
Views in the last 30 days - 0
Square a company that has evolved since its inception in 2009 is seeking a Software Engineer with extensive experience in embedded devices and test en...
View DetailsIT Support Engineer (Contract) - Informa
Views in the last 30 days - 0
Curinos a company with decades of expertise in the financial services industry is seeking an IT Support Engineer for their New York office The role in...
View DetailsEngineer, Quality Assurance – BBU (EQA1) - JMA Wireless
Views in the last 30 days - 0
JMA is a leading company in wireless technology particularly in 5G with its advanced softwarebased platform manufactured in Syracuse NY The companys t...
View DetailsStaff Editor, Current Events - Dotdash Meredith
Views in the last 30 days - 0
The Staff Editor role involves coordinating crossplatform content across multiple verticals managing daily and breaking news and writingediting storie...
View DetailsBusiness Account Executive - Spectrum
Views in the last 30 days - 0
The Business Account Executive role involves selling primary and ancillary communications solutions to small and mediumsized businesses within a speci...
View Details