DevOps - Senior Site Reliability Engineer
Job Description
Team: IT
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a DevOps - Senior Site Reliability Engineer in Brazil.
As a Senior Site Reliability Engineer (SRE), you will play a critical role in ensuring the stability, scalability, and efficiency of cloud-based SaaS platforms. You will collaborate closely with application engineering teams to improve CI/CD pipelines, automate manual processes, and enhance observability and monitoring practices. Your work will directly reduce operational risk, increase deployment frequency, and strengthen disaster recovery capabilities. The role offers a dynamic environment where your technical expertise will not only drive infrastructure improvements but also elevate the engineering skills and practices of the teams around you. You will have the opportunity to shape best practices, implement robust automation, and contribute to a culture of operational excellence.
Accountabilities:
- Design, maintain, and improve CI/CD pipelines using Jenkins and GitHub to ensure reliable software delivery.
- Automate repetitive operational tasks to minimize manual toil and reduce deployment risk.
- Partner with engineering teams to define SLOs and SLIs and implement monitoring best practices using Datadog.
- Improve observability through dashboards, alerts, logs, and signal quality enhancements to reduce alert fatigue.
- Support disaster recovery planning, backup validation, and alignment with defined RTO and RPO objectives.
- Contribute to infrastructure-as-code practices and ensure SOC2-aligned operations and audit readiness.
- Mentor peers by sharing patterns, elevating technical standards, and promoting best practices across teams.
- Proven experience in SRE, DevOps, or infrastructure engineering within production SaaS environments.
- Advanced hands-on experience with AWS core services, including Lambda, EventBridge, SNS, SES, S3, ALB, and ECS.
- Expertise in CI/CD pipeline design and operation using Jenkins and GitHub.
- Hands-on experience with Datadog for monitoring, alerting, log management, and SLO/SLI implementation.
- Proficiency in infrastructure-as-code tools such as Terraform, CloudFormation, or CDK.
- Strong programming or scripting skills in Python, Go, or Bash.
- Experience with disaster recovery planning, testing, and audit-aligned operational processes.
- Strong collaboration, problem-solving, and mentorship abilities to elevate team performance.
- Flexible work arrangements: fully remote with autonomy over your schedule.
- Competitive salary and performance-based incentives.
- Professional growth opportunities through exposure to cloud and DevOps best practices.
- Health, dental, and life insurance coverage.
- Access to wellness programs and resources for physical and mental well-being.
- Support for home office setup and tools to enhance productivity.
- Opportunities to participate in innovation initiatives and technical knowledge sharing.
Requirements:
Benefits:
Explore More
Date Posted
04/08/2026
Views
0
Similar Jobs
Staff Software Engineer (+5 years, Golang or Python) - Jobgether
Views in the last 30 days - 0
View Details