Senior Site Reliability Engineer
Job Description
Team: IT
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer in Brazil.
This role sits at the core of a fast-scaling logistics technology environment where reliability, performance, and automation are critical to powering large-scale distributed systems. You will help design and operate the internal platform that enables engineering teams to deliver high-quality software with confidence and speed. The position blends infrastructure engineering, cloud operations, and software reliability practices in a highly collaborative global setup. You will take ownership of mission-critical systems while continuously improving observability, incident response, and system resilience. Working closely with multiple engineering squads, you will influence architectural decisions and drive platform-wide reliability initiatives. This is a high-impact role where your work directly strengthens system stability, efficiency, and scalability across the organization.
Accountabilities:
You will be responsible for ensuring the reliability, scalability, and performance of critical infrastructure and platform services while enabling engineering teams to operate efficiently in production environments.
- Design, deploy, and operate scalable cloud-based systems while balancing reliability, cost, and development velocity
- Own and improve SLIs/SLOs, ensuring platform services consistently meet reliability targets
- Lead incident response, root-cause analysis, and postmortem processes to prevent recurring issues
- Build and enhance observability through monitoring, logging, and alerting frameworks
- Support infrastructure-as-code and automation initiatives to improve deployment consistency and efficiency
- Collaborate with engineering teams to improve system design, performance, and operational readiness
- Contribute to CI/CD pipelines, deployment strategies, and release engineering practices
- Provide production support, including occasional off-hours incident handling when required
- 5+ years of experience in SRE, DevOps, or Cloud Engineering roles
- Strong expertise in AWS, Kubernetes, Docker, and modern cloud-native architectures
- Proficiency in Linux/UNIX systems administration and production troubleshooting
- Experience with infrastructure-as-code tools such as Terraform, Ansible, or Chef
- Strong programming/scripting skills (Python, Bash, or similar) for automation and tooling
- Solid understanding of networking, system design, and distributed systems principles
- Experience with monitoring, logging, and incident management tools and practices
- Familiarity with CI/CD pipelines and DevOps best practices
- Exposure to PostgreSQL or database operations is a plus
- Strong English communication skills and ability to work in global, distributed teams
- Problem-solving mindset with high ownership, initiative, and attention to detail
- Competitive base salary aligned with market standards
- Equity package with ownership opportunities in a high-growth tech environment
- Unlimited PTO and flexible time-off policy
- Remote-first setup within Brazil
- Opportunity to work on large-scale distributed systems in a global engineering organization
- Collaborative, high-impact engineering culture focused on innovation and continuous improvement.
Requirements
You bring strong hands-on experience in cloud infrastructure, DevOps, and site reliability engineering, with the ability to operate in complex distributed environments.
Benefits
Explore More
Date Posted
05/26/2026
Views
0
Similar Jobs
Sr. Energy Storage Project Development Manger - Jobgether
Views in the last 30 days - 0
View Details