Senior Staff Site Reliability Engineer at Block

Company Description
Block is one company built from many blocks, all united by the same purpose of economic empowerment. The blocks that form our foundational teams - People, Finance, Counsel, Hardware, Information Security, Platform Infrastructure Engineering, and more - provide support and guidance at the corporate level. They work across business groups and around the globe, spanning time zones and disciplines to develop inclusive People policies, forecast finances, give legal counsel, safeguard systems, nurture new initiatives, and more. Every challenge creates possibilities, and we need different perspectives to see them all. Bring yours to Block.
Job Description
As a Senior Staff Site Reliability Engineer at Block, you will be a key player in maintaining and improving the reliability of our systems. Your primary focus will be on designing, building, and maintaining scalable, reliable infrastructure and services. You will work closely with development and operations teams to ensure high availability, performance, and capacity of our services. As an early hire of this newly formed team, you will have a significant role in building the foundation that will keep Block's infrastructure reliable for years to come.
You Will:

Develop and implement strategies for improving system reliability and performance.
Design, build, and maintain scalable and reliable infrastructure using AWS and Kubernetes.
Troubleshoot and debug complex issues in a distributed environment.
Collaborate with development teams to promote best practices for reliability, scalability, and performance.
Conduct root cause analysis of incidents and implement preventive measures.
Monitor system performance and capacity, proactively identifying and addressing potential issues.
Mentor and provide guidance to junior SREs and other team members.
Participate in on-call rotations to provide 24/7 support for critical systems.
Continuously improve processes and tools to enhance system reliability and operational efficiency.

Qualifications
You Have:

12+ years of experience in site reliability engineering or a related field.
Extensive experience with AWS, including services such as EC2, S3, RDS, and Lambda.
Strong expertise in Kubernetes and container orchestration.
Proven experience in designing, building, and maintaining highly available and scalable systems.
Strong debugging and troubleshooting skills, with a focus on root cause analysis.
Proficiency in programming languages such as Python, Go, or Java.
Experience with infrastructure as code tools such as Terraform or CloudFormation.
Solid understanding of monitoring and observability tools, such as Datadog.
Excellent communication and collaboration skills.
Ability to mentor and lead junior team members.

Preferred:

Experience with CI/CD pipelines and tools such as Jenkins, GitLab, or BuildKite.
Knowledge of security best practices and tools.
Experience with database management and optimization.
Familiarity with service mesh architectures, such as Istio or Linkerd.
Understanding of networking concepts and protocols.

Qualifications
You Have:

12+ years of experience in site reliability engineering or a related field.
Extensive experience with AWS, including services such as EC2, S3, RDS, and Lambda.
Strong expertise in Kubernetes and container orchestration.
Proven experience in designing, building, and maintaining highly available and scalable systems.
Strong debugging and troubleshooting skills, with a focus on root cause analysis.
Proficiency in programming languages such as Python, Go, or Java.
Experience with infrastructure as code tools such as Terraform or CloudFormation.
Solid understanding of monitoring and observability tools, such as Datadog.
Excellent communication and collaboration skills.
Ability to mentor and lead junior team members.

Preferred:

Experience with CI/CD pipelines and tools such as Jenkins, GitLab, or BuildKite.
Knowledge of security best practices and tools.
Experience with database management and optimization.
Familiarity with service mesh architectures, such as Istio or Linkerd.
Understanding of networking concepts and protocols.

Senior Staff Site Reliability Engineer

Company

Location

Type

Job Description

Explore More

Date Posted

Views

Similar Jobs

Civil-Site Engineer - Leidos

Senior Manager, Communications Business Partner, Technology - Autodesk

Esri - C++ Software Engineer I - Maps Sdks - Esri

Site Director at Woodstock - KinderCare Learning Companies

Portland, OR (Buckman) Territory Account Executive, SMB - Toast

Territory Account Executive (Mandarin), Strategic Cuisines - Portland, OR - Toast

Browse By Category

Browse By Location

Browse By Company

Free Tools

Popular Searches

Resources