Staff SRE
Job Description
Virtasant is a leading cloud consulting services provider. We heavily focus on lift & shift cloud-native development cloud cost optimization and migration services. As a consulting company we often face the challenge of creating an engineering team in a matter of a week or two. To do that we have created a secondary support business that runs a talent network and provides staffing services.
We are seeking a highly skilled and experienced Staff Site Reliability Engineer (SRE) to join our dynamic team. In this role you will be responsible for ensuring the reliability scalability and performance of our critical systems and services. As a Staff SRE you will play a pivotal role in shaping infrastructure for our client and driving initiatives that improve the overall service quality.
Key Responsibilities:
-
System Design and Architecture:
-
Design build and maintain scalable and reliable infrastructure.
-
Collaborate with engineering teams to ensure systems are designed with reliability and scalability in mind.
-
Evaluate and integrate new technologies to enhance our infrastructure.
-
-
Monitoring and Incident Management:
-
Implement and maintain monitoring and alerting systems to detect and respond to issues promptly.
-
Lead incident response efforts ensuring quick resolution and effective communication.
-
Conduct post-incident reviews and drive improvements based on findings.
-
-
Automation and Optimization - Reduce SRE Toil:
-
Architect & Build innovative automation projects (preferably in Python/GoLang) from scratch to help reduce day-to-day SRE toil
-
Create Bash scripts to automate mundate manual activities like upgrades status checks and deployment
-
Develop and maintain infrastructure as code (IaC) using tools such as Terraform Ansible or similar.
-
Automate repetitive tasks and processes to improve efficiency and reduce manual intervention.
-
-
Collaboration and Mentorship:
-
Collaborate with cross-functional teams to deliver high-quality products and services.
-
Mentor and guide junior SREs and other team members.
-
Advocate for best practices in reliability engineering across the organization.
-
-
Continuous Improvement:
-
Drive initiatives to improve service reliability capacity and performance.
-
Participate in capacity planning and disaster recovery exercises.
-
Stay current with industry trends and emerging technologies.
-
Qualifications:
-
Education and Experience:
-
Bachelor's degree in Computer Science Engineering or a related field (or equivalent practical experience).
-
8+ years of minimum experience in the industry as a Software Engineer SRE or Platform Engineer
-
Minimum 3+ years of experience as a Platform Engineer or SRE
-
Proven experience in managing large-scale mission-critical infrastructure.
-
Technical Skills:
Deep understanding of Linux/Unix systems and networking.
-
Proficiency in at least one or more programming languages (e.g. Python Go Java).
-
Intermediate to Expert level skill in bash scripting
-
Experience with cloud platforms (AWS Azure GCP) and container orchestration (Docker Kubernetes).
-
Strong knowledge of monitoring and logging tools (e.g. Prometheus Grafana ELK stack).
-
Familiarity with CI/CD pipelines and tools (e.g. Jenkins GitLab CI).
-
Soft Skills:
-
Excellent problem-solving skills and a proactive attitude.
-
Strong communication and collaboration skills.
-
Ability to work independently and as part of a team.
-
Demonstrated leadership and mentoring abilities.
-
Candidates must be able to work during Pacific time hours 8am - 5pm PST open to on-call rotation.
ββRecruitment process
-
Recruiter screen (30 mins)
-
Technical Interview (45 min)
-
Hiring Manager Interview (30min)
We strive to move efficiently from step to step so the recruitment process can be as fast as possible.
What we offer
-
Fully remote 40 hours/week.
-
Long term contract
-
Payment in USD
-
PTO
-
Training and certification opportunities on AWS GCP and/or Azure.
Date Posted
07/15/2024
Views
0
Similar Jobs
Senior Fullstack Engineer - Aleph
Views in the last 30 days - 0
Aleph is an AInative platform for financial planning and analysis offering a seamless way to centralize financial data and automate reporting The comp...
View DetailsSecurity Analyst - Cloudbeds
Views in the last 30 days - 0
This role as a Security Analyst involves critical responsibilities in ensuring system security and compliance collaborating across teams and driving s...
View DetailsAssigned Support Engineer (AMER) - GitLab
Views in the last 30 days - 0
This job description outlines the role of an Assigned Support Engineer at GitLab emphasizing technical expertise customer support and collaboration wi...
View DetailsDirector - Global Compensation - GitLab
Views in the last 30 days - 0
This job description outlines the Director Global Compensation role at GitLab detailing responsibilities for shaping global compensation strategies co...
View DetailsManager - Support Engineering - GitLab
Views in the last 30 days - 0
This role involves leading a Support Engineering team improving customer support processes and fostering team growth The position offers competitive s...
View DetailsSenior Build/Release/CI Engineer - Brave
Views in the last 30 days - 0
This job posting highlights Braves mission to protect online privacy through innovative products like a private browser and search engine It emphasize...
View Details