Job Description
Title: Sr. Site Reliability Engineer
Location: Remote(US)
Who We Are:
Rackner is a software consultancy that builds cloud-native solutions for startups, enterprises, and the public sector. We are an energetic, growing consultancy with a passion for solving big problems for both startups and enterprises. We enable digital transformation for large organizations through the newest in distributed technologies as we are laser focused on end-to-end application development, DevSecOps, AI/ML and systems architecture and our methodology focuses on cloud-first and cost-effective innovation. Our customers hail from a diverse, ever growing list of industries.
Position Overview:
The Senior Site Reliability Engineer is a senior software developer that ensures DevSecOps principals are followed through the entire software delivery lifecycle. Working across multiple product teams to get a grasp of a product and/or programs overall state of health, the SRE will resolve any issues and ensure the product team is in a healthy state. Considered an expert on the CI/CD process and the overall vision of the program, the SRE will combine their engineering experience and innate drive to improve existing systems and processes, with creativity to develop novel solutions to evolving challenges.
Responsibilities:
- Manage and maintain availability and reliability of critical platform services and applications, ensuring they meet requirements of internal and external users
- Collaborate with business leaders in building and running sustainable production systems, which can evolve and adapt to changes in a global business environment
- Evaluates performance results and recommends major changes affecting short-term project growth and success
- Respond to incidents that impact Platform One availability, and provide support for service engineers with customer incidents
- Run infrastructure with Chef, Ansible, Terraform, GitLab, CI/CD and Kubernetes
- Build monitoring that alerts on symptoms rather than on outages
- Document every action so your findings turn into repeatable actions and then into automation
- Use the GitLab product to run GitLab.com as a first resort and improve the product as much as possible
- Improve operational processes
- Design build and maintain core infrastructure that enables GitLab scaling to support hundreds of thousands of concurrent users
- Debug production issues across services and levels of the stack
- Plan the growth of Platform One’s infrastructure (e.g. Party Bus, Iron Bank, Big Bang)
Requirements:
- Experience in Technical Customer Service, Customer Management
- Be available to respond to incidents that impact Platform One availability, and provide support for service engineers with customer incidents
- Possesses and applies a comprehensive knowledge across key tasks and high impact assignments
- Experience planning and leading major technology assignments
- Has been a technical expert across team and tasks
- Created and maintained documentation for implementations
Skills/Qualifications:
- Solid customer service and communication skills
- Self-motivated/self-starter
- Ability to obtain a Secret Security Clearance
Additional Info/Benefits
Rackner embraces and promotes employee development and training and covers the cost of certifications relevant to a position and the technologies/services provided . Fitness/Gym membership eligibility, weekly pay schedule and employee swag, snacks & events are offered as well!
- 401K with 100% matching up to 6%
- Highly competitive PTO
- Great health insurance with large network of providers
- Medical/Dental/Vision
- Life Insurance, and short & long term disability
- Industry-Leading Weekly Pay Schedule
- Home office & equipment plan
Date Posted
02/02/2023
Views
0
Similar Jobs
Vice President, Global Platforms - Mastercard Open Finance -
Views in the last 30 days - 0
View Details