Lead Site Reliability Engineer
Company
McAfee
Location
Other US Location
Type
Full Time
Job Description
Role Overview:
As a Site Reliability Engineer (SRE) Technical Lead, you will be instrumental in overseeing the reliability, availability, and performance of our production environments at an advanced level. You will lead initiatives in proactive monitoring and management of incidents, fostering a culture of rapid resolution and minimal service disruption. Your extensive troubleshooting, log data analysis and debugging skills will facilitate close collaboration with DevOps, Engineering, and internal support teams, allowing us to achieve the highest levels of customer satisfaction.
Key responsibilities include:
- Proficient in AWS (Amazon Web Service) Cloud technology and have good hands on experience on some of the major services, ALB, NLB, Athena, VPC, EC2, RDS and Cloudwatch along with good experience in Athena log query analysis.
- Effectively drive the APM monitoring solution POC or having good hands-on experience on Prometheus and Grafana monitoring setup on microservice based environment.
- Provide tailor made monitoring solution to support critical consumer-based environment through elimination of false positives and ability to script in PMQL or any monitoring programming language to automate the monitoring capability.
- Lead efforts to troubleshoot, debug, and escalate issues with thorough details CloudWatch Log analysis, enhancing overall service availability and reliability.
- Provide detailed analysis by pulling various log metrics and give in-depth insight of the issue by relating with various trends and frequency of web/API based troubleshooting.
- Leverage your extensive experience in the AWS cloud computing platform, including EC2, S3, EBS, VPC, ELB, AMI, SNS, RDS, IAM, Route 53, and Auto Scaling, to drive service scalability and performance optimization.
- Independently analyze the cost utilization related to various AWS service and come up with suggestion for optimization with proper implementation.
- Oversee the deployment of code updates across test and production environments, facilitating seamless rollouts of enhancements.
- Track and escalate all critical production issues through designated tracking applications, maintaining the integrity of service delivery.
- Direct operations on an EKS/Kubernetes-based setup, including onboarding, maintaining, and decommissioning services, in close partnership with the Core DevOps team.
- Manage GitHub PR requests, ensuring efficient triggering and analysis of pipelines for Kubernetes configuration changes.
- Spearhead root cause analyses for production incidents, implementing and advocating for long-term solutions to persistent challenges.
- Exhibit robust hands-on troubleshooting expertise with Kubernetes clusters, Pods, and services.
- Champion collaboration with peer teams across various geographic locations, fostering a cohesive team environment.
- Lead root cause analysis efforts for assigned incidents, ensuring that insights gained are documented and shared across teams.
About You:
We are seeking a seasoned Site Reliability Engineer (SRE) Technical Lead who excels in dynamic environments and is deeply committed to delivering superior customer experiences. Your experience in observability space and extensive problem-solving and troubleshooting capabilities will be pivotal in enhancing our service reliability at a strategic level.
Key qualifications include:
- 8 + years of experience in the web and e-commerce domain, with a specific focus on cloud hosting (primarily AWS).
- A passion for log analysis, getting into the depth of issue through various observability platform and efficient to drive the complete traces of the troubleshooting calls.
- Good Hands-on experience on Prometheus/Grafana or any APM tool to tweak and tune the monitoring capabilities based on requirement.
- Strong analytical skills with a proactive approach to assessing issues and their impacts on systems, exhibiting enthusiasm for troubleshooting challenges.
- Exceptional interpersonal, written, and verbal communication skills that facilitate impactful collaboration within teams and across the organization.
- A track record of innovative thinking and a willingness to propose and initiate significant service improvements based on data-driven analyses.
- Experience in developing automation tools or scripts to optimize processes and reduce manual interventions and leverage Athena or Log-insight query.
- A collaborative leadership style that values teamwork and promotes collective success throughout the organization.
- An adaptive mindset to changes, along with a strong interest in exploring and implementing the latest technologies.
- Self-motivated, results-oriented, and strategically adept, with an ability to drive meaningful improvements in service delivery.
Company Overview
McAfee is a leader in personal security for consumers. Focused on protecting people, not just devices, McAfee consumer solutions adapt to users’ needs in an always online world, empowering them to live securely through integrated, intuitive solutions that protects their families and communities with the right security at the right moment.
Company Benefits and Perks:
We work hard to embrace diversity and inclusion and encourage everyone at McAfee to bring their authentic selves to work every day. We offer a variety of social programs, flexible work hours and family-friendly benefits to all of our employees.
- Bonus Program
- Pension and Retirement Plans
- Medical, Dental and Vision Coverage
- Paid Time Off
- Paid Parental Leave
- Support for Community Involvement
We're serious about our commitment to diversity which is why McAfee prohibits discrimination based on race, color, religion, gender, national origin, age, disability, veteran status, marital status, pregnancy, gender expression or identity, sexual orientation or any other legally protected status.
Date Posted
12/12/2024
Views
0
Similar Jobs
Lead Technical Support Engineer - HERE Technologies
Views in the last 30 days - 0
This role Senior Technical Support Engineer at HERE Technologies involves supporting a diverse portfolio of products and services acting as a technica...
View DetailsPrincipal / Lead Software Engineer- RUST (Algorithmic and Mathematics) - m/w/d - HERE Technologies
Views in the last 30 days - 0
HERE Technologies is seeking a Principal Software Engineer to lead the development of extended services for their VRP solver Tour Planning The role in...
View DetailsSoftware Architecture Engineering and Cloud Computing Engineer - The Aerospace Corporation
Views in the last 30 days - 0
The Aerospace Corporation is seeking a Senior Project Engineer with expertise in software architecture engineering and cloud computing The role involv...
View DetailsSenior Lead, Talent Acquisition - Sales (Relocation to Munich) (d/f/m) - Personio
Views in the last 30 days - 0
Personio a leading HR platform is seeking a Senior Lead Talent Acquisition professional to drive growth in the Revenue and Success functions across Eu...
View DetailsTeam Lead, Expansion Account Executive - Personio
Views in the last 30 days - 0
Personio a human resources platform is seeking a Team Lead Expansion Account Executive with 5 years of experience in B2B software sales The role invol...
View DetailsLead Data Analyst - Mitigation - WISE
Views in the last 30 days - 0
Wise is a global technology company seeking an Operations Analyst with 4 years of experience in analytics particularly in operational team analytics T...
View Details