Lead Software Engineer, Site Reliability Engineering - Developer Experience (Remote USA)

Klaviyo • Boston, MA

Company

Klaviyo

Location

Boston, MA

Type

Full Time

Job Description

At Klaviyo, we value the unique backgrounds, experiences and perspectives each Klaviyo (we call ourselves Klaviyos) brings to our workplace each and every day. We believe everyone deserves a fair shot at success and appreciate the experiences each person brings beyond the traditional job requirements. If you’re a close but not exact match with the description, we hope you’ll still consider applying. Want to learn more about life at Klaviyo? Visit careers.klaviyo.com to see how we empower creators to own their own destiny.

Check out this video with the hiring manager, Alan Manos, to learn more! 
 
Site Reliability Engineering (SRE) is what you get when you treat system operations as a software engineering problem. The mission of the SRE team is to ensure uninterrupted service for Klaviyo customers and act as a force multiplier for Klaviyo product teams to deliver better software faster. 
 
The SRE teams build foundational backend services as well as tooling and automation to allow product teams to release and scale their software reliably and predictably. Lead SREs are team players who embed themselves within product teams as needed to advance the architecture and performance of software systems and train their peers in topics such as debugging distributed systems, building self-healing applications and eking out every drop of performance possible.
 
Internally, we call this role Lead Site Reliability Engineer on our Velocity Site Reliability Engineering team.
Mission and Vision of the Velocity SRE Team

Vision: Make it easy for engineers to develop applications and safely deploy to production such that there is one way of deploying our custom application code, enabling hundreds of small pushes a day. Automation + standards exist for every step of developer life such that product engineers can self-service their containerized, cloud-based local environments and answer their development questions.

Mission: Optimize developer efficiency to support scaling to 500 engineers, including enabling containerized, cloud-based throwaway staging environments for the Klaviyo application, building out a self-service automation platform for testing + development tooling, touchless CI/CD, Slackbots for answering common tooling questions, and establishing a single source of truth for engineering documentation.

As a Lead Site Reliability Engineer, you will own foundational Klaviyo services and make a big impact on the productivity of our product engineering teams. Klaviyo is growing fast and we have openings for all skill levels across all of our teams. Learn more about our engineering culture at https://klaviyo.tech
 
How You'll Make a Difference
  • Ship foundational services to enable Klaviyo engineering to move faster with confidence
  • Design and develop systems and processes that enable highly available & scalable systems
  • Achieve break-throughs in systems throughput by identifying and eliminating bottlenecks
  • Leverage technology such as Python, AWS, Django, Kubernetes, Bash, Terraform, MySQL, Redis, Cassandra, Postgresql to advance Klaviyo’s platform
  • Champion best practices by actively collaborating with other teams in a culture that values whiteboarding and technical design review
  • Contribute to the company in multiple areas, constantly pushing yourself to be a better engineer and to level up all of your peers within your team and within Klaviyo.
  • Design, write and deliver software to dramatically improve the availability, scalability, latency, and efficiency of Klaviyo’s services
  • Participate in periodic on call duties with a focus on solving issues when they are discovered, preventing recurrences and minimizing alert fatigue 
  • Implement architectural improvements to achieve breakthrough results in Klaviyo systems’ operational scalability and reliability.
  • Work hand-in-hand with product-facing engineers and other SREs to ship impactful code
  • Perform quantitative analysis to understand and scale Klaviyo systems
  • Uncover and advocate for preventative, upstream solutions with internal stakeholders
  • Evangelize Site Reliability best practices across the engineering organization 
Who You Are
  • Solid 10+ years of experience in the SRE/Devops field
  • BA or BS Degree in Computer Science, related field, or equivalent experience
  • Ability to handle yourself in outage situations and to drive failures to root cause analysis and prevention of future issues
  • Understanding of Linux (we run Ubuntu) and all layers of the networking stack
  • Experience working on an engineering team building software
  • Experience writing code using best practices in a language such as Python, Ruby, Go, etc.

Get to Know Klaviyo

Klaviyo is a world-leading marketing automation platform dedicated to accelerating revenue and customer connection for online businesses. Klaviyo makes it easy to store, access, analyze and use transactional and behavioral data to power highly-targeted customer and prospect communications. The company's hybrid customer-data and marketing-platform model allows companies to grow by fostering direct relationships with customers, without giving up their valuable data to popular big-tech ad platforms. Over 265,000 innovative companies like Unilever, Custom Ink, Living Proof and Huckberry sell more with Klaviyo. Learn more at www.klaviyo.com.

If you are a California, Colorado, Rhode Island, Washington, New York City, or Jersey City resident and this role is a remote role, you can receive additional information about the compensation and benefits for this role, which we will provide upon request. Requests can be submitted here. Additional information regarding benefits can be found at klaviyorewards.com.

Klaviyo is committed to diversity and to a policy of equal employment opportunity and non-discrimination. We do not discriminate on the basis of race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, veteran status, gender identity, sexual orientation or any other characteristic protected by applicable law.

Apply Now

Date Posted

02/22/2023

Views

3

Back to Job Listings ❤️Add To Job List Company Info View Company Reviews
Positive
Subjectivity Score: 0.9

Similar Jobs

Senior Network Engineer - InterSystems

Views in the last 30 days - 0

InterSystems is seeking a Senior Network Engineer to support the deployment and maintenance of network infrastructure for their HealthShare and IRIS p...

View Details

Information Technology Intern (Summer 2025) - LineVision

Views in the last 30 days - 0

LineVision a rapidly growing climate tech company based in Boston MA is seeking an Information Technology Intern to deploy a new Modern Device Managem...

View Details

Platform Owner - Network Reliability - Takeda

Views in the last 30 days - 0

Takeda is seeking a Platform Owner for Network Reliability Engineering to join their Global Network Platform team The role involves developing framewo...

View Details

IT Solution - Product Engineer - Takeda

Views in the last 30 days - 0

Takeda Development Center Americas Inc is seeking an IT Solution Product Engineer with a Bachelors degree in Engineering or a related field and 3 year...

View Details

Data Platform Engineer - GMSGQ - Takeda

Views in the last 30 days - 0

Takeda Pharmaceuticals USA is seeking a Data Platform Engineer GMSGQ for a fulltime position in Cambridge MA The role involves developing and maintain...

View Details

Senior Software Engineer (Full Stack, Platform) - WHOOP

Views in the last 30 days - 0

WHOOP is seeking a Senior Software Engineer to join their Platform team in Boston MA The role involves driving largescale architecture projects collab...

View Details