Manager, Site Reliability Engineering
Job Description
At Talkdesk, we are courageous innovators focused on redefining the customer experience, making the impossible possible for companies globally. We champion an inclusive and diverse culture representative of the communities in which we live and serve. And, we give back to our community by volunteering our time, supporting non-profits, and minimizing our global footprint. Each day, thousands of employees, customers, and partners all over the world trust Talkdesk to deliver a better way to great experiences.
We are recognized as a cloud contact center leader by many of the most influential research organizations, including Gartner and Forrester. With $498 million in total funding, a valuation of more than $10 Billion, and a ranking of #8 on the Forbes Cloud 100 list, now is the time to be part of the Talkdesk legacy to help accelerate our success in a new decade of transformational growth.
At Talkdesk, we embrace FAST, our fundamental operating principles that define who we are as an organization. These principles drive us to make the impossible possible. FAST: Focus + Accountability + Speed = Talkdesker.
- Focus: Focus time, energy and attention on what is most impactful for the business and thoughtful about how and when to partner with others.
- Accountability: Hold self and others accountable to meet commitments and drive results. Accept responsibility for successes and failures.
- Speed: Execute with agility and urgency. Act promptly, decisively, and without delay. Make good and timely decisions that keep the organization moving forward.
- Talkdesker: YOU!
As a Manager, Site Reliability Engineering, you will set the standard, track the deliverables, and own the outcome of your team. You will also play a critical role in ensuring the reliability, availability and performance of Talkdesk Infrastructure and Mission Critical Services. You will collaborate with development teams to design, build, and maintain scalable and resilient systems. Your primary focus will be on improving/automating operations, improving observability, monitoring system health, and responding to incidents to meet a telco grade service reliability.
Responsibilities:
- Lead a team of Senior SREs in owning our operational environments
- Organize and drive regular fire drills with engineering teams to practice and validate on-call and operational procedures
- Own the on-call rotation for your team. Including incident response, resolution, post mortems, and system and process improvements.
- Ensure and participate in post-mortem analyses and work/drive mitigation actions to prevent recurrence of incidents.
- Collaborate with software engineering teams to promote best practices in terms of reliability, performance, and scalability during the development and operation lifecycle.
- Continuously improve system performance and reliability through proactive system monitoring, capacity planning and performance tuning.
- Improve and maintain systems observability and alerting to identify, respond to potential issues proactively and reduce Mean Time to Detect (MTTD).
- Guide a team in building and maintaining tools for automation, configuration management, and deployment to improve operational efficiency, reliability and reduce Mean Time to Repair (MTTR).
Requirements:
- Degree in Computer Science, Information Technology, or related fields.
- 3+ years experience as a Site Reliability Engineer or DevOps Engineer for Cloud Platforms. (e.g. AWS, Google, Azure)
- 3+ years Linux experience.
- 3+ yearsΒ cloud platforms and container technologies (e.g. Docker, Kubernetes).
- Expertise in at least one programming language (e.g. Python, Java/Spring, Ruby)
- Deep Experience with automation tools (e.g. Ansible, Terraform).
- Solid understanding of networking constructs such as firewalls, application firewalls, load balancers
- Solid understanding of distributed systems and microservices architecture at scale.
- Experience with monitoring tools (e.g. New Relic, Prometheus, Grafana, ELK stack) and incident management systems (e.g. Splunk).
- Experience with messaging systems (e.g. Kafka, Rabbit MQ, Amazon MQ)
Work Environment and Physical Requirements:
Primarily office-environment work, extended periods of sitting or standing, computer-based work. Limited lifting, and equipment usage limited to computer-related equipment (keyboards, mouse, etc.)
The Talkdesk story hinges on empathy and acceptance. It is the shared goal among all Talkdeskers to empower a new kind of customer hero through our innovative software solution, and we firmly believe that the best path to success for our mission is inclusivity, diversity, and genuine acceptance. To that end, we will hire, promote, work along, cheer for, bond with, and warmly welcome into the Talkdesk family all persons without regard to ethnic and racial identity, indigenous heritage, national origin, religion, gender, gender identity, gender expression, sexual orientation, age, disability, marital status, veteran status, genetic information, or any other legally protected status.Date Posted
10/26/2023
Views
0
Similar Jobs
Senior Design Manager (Infrastructure) - Canonical
Views in the last 30 days - 0
Canonical a leading opensource provider seeks a Senior Design Manager to drive innovation in cloud and AI technologies The role offers remote work glo...
View DetailsProduct Manager Wallet SDKs - Startale
Views in the last 30 days - 0
The text describes a job alert system where applicants must mention UNSELFISH and use a specific tag to demonstrate they read the post It explains the...
View DetailsSenior Product Designer - Org & Security - Typeform
Views in the last 30 days - 0
This job description outlines a role in developing an intelligent contact management system with AI capabilities The position involves designing user ...
View DetailsExecutive Director Patient Advocacy - Kyverna Therapeutics
Views in the last 30 days - 0
Kyverna Therapeutics is seeking an Executive Director for Patient Advocacy to lead initiatives in autoimmune disease treatment The role involves build...
View DetailsMedical Affairs Writer Contract - Kyverna Therapeutics
Views in the last 30 days - 0
Kyverna Therapeutics seeks a Medical Affairs Writer to develop scientific publications and communications for cell therapy innovations The role requir...
View DetailsRecovery Analyst Underpayments - Trend Health Partners
Views in the last 30 days - 0
TREND Health Partners seeks an Underpayment Recovery Analyst to optimize client reimbursement through collaboration and detailed claim analysis The ro...
View Details