Lead Site Reliability Engineer, Observability

Company

ThousandEyes (part of Cisco)

Location

Remote

Type

Full Time

Job Description

Who We Are

The name ThousandEyes was born from two big ideas: the power to see things not ordinarily possible and the ability to collect insights from a multitude of vantage points. As organizations rely more on cloud services and the Internet, the network has become a black box they can't understand. Our Internet and cloud intelligence platform delivers the only collectively powered view of the Internet, cloud and SaaS platforms, helping enterprises and service providers work together to identify problems before it impacts revenue, damages brand reputation, or halts employee productivity.

In August 2020, Cisco Systems completed the acquisition of ThousandEyes, which now forms the ThousandEyes Business Unit within Cisco’s Network Services Business Group, and is a foundational component of Cisco’s growing Observability business.

About The Role

The Site Reliability Engineering team focused on Observability is responsible for providing the tools, services, and infrastructure to monitor and observe the ThousandEyes platform. Leveraging cloud native tools like Prometheus, Grafana, Kibana, and even ThousandEyes itself, we enable our developers to instrument, analyze, and monitor their applications. The Senior Site Reliability Engineer in this role will work together with the team to own our logging pipeline and monitoring stack while working with developers to continuously improve our view of the platform.

Responsibilities
  • Design and implement visibility into our platform as we grow to multi-region scale.
  • Design, deploy, and maintain cloud native monitoring services in AWS and GCP that are elastic and resilient to failure.
  • Provide standards and best practices for instrumentation of container based services and cloud managed services.
  • Maintain our alerting pipeline so that we are notified of the right things, at the right time, in the right places.
  • Drive automation wherever possible, enabling our monitoring platforms to scale effortlessly. Think self service.
  • Participate in and contribute to improve our 24x7 incident response and on-call rotation.

 

Required skills
  • Strong Infrastructure as Code skills, ideally with Terraform and Kubernetes.
  • Strong knowledge of modern logging tool sets, including Logstash or Fluentd.
  • Understanding of Prometheus and it’s ecosystem, including Alertmanager.
  • Good knowledge of Application Performance Monitoring tools and crash reporting tools, such as Sentry.
  • Good knowledge of cloud provider managed services, and how they can be leveraged in our context.
  • Ability to write high quality code in Python, Go, or equivalent languages.

Cisco is an Affirmative Action and Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis. Cisco will consider for employment, on a case by case basis, qualified applicants with arrest and conviction records. 


Why Cisco

#WeAreCisco, where each person is unique, but we bring our talents to work as a team and make a difference powering an inclusive future for all.

We embrace digital, and help our customers implement change in their digital businesses. Some may think we’re “old” (36 years strong) and only about hardware, but we’re also a software company. And a security company. We even invented an intuitive network that adapts, predicts, learns and protects. No other company can do what we do –you can’t put us in a box! But “Digital Transformation” is an empty buzz phrase without a culture that allows for innovation, creativity, and yes, even failure (if you learn from it.)

Day to day, we focus on the give and take. We give our best, give our egos a break, and give of ourselves (because giving back is built into our DNA.) We take accountability, bold steps, and take difference to heart. Because without diversity of thought and a dedication to equality for all, there is no moving forward.

So, you have colourful hair? Don’t care. Tattoos? Show off your ink. Like polka dots? That’s cool. Pop culture geek? Many of us are. Passion for technology and world changing? Be you, with us.

We recognize that diverse teams make the strongest teams, and we encourage people from all backgrounds to apply.


Cisco COVID-19 Vaccination Requirements

The health and safety of Cisco's employees, customers, and partners is a top priority. Our goal is to protect and mitigate the spread of COVID-19 infection for strong business resiliency during the pandemic. Therefore, Cisco may require new hires to be fully vaccinated against COVID-19 if the role requires business-related travel, meeting with customers/partners (including visiting third-party sites on behalf of Cisco), attending trade events, and Cisco office entry, unless otherwise prohibited by applicable law, and in countries where COVID-19 vaccination is legally required. The company will consider legally required accommodations/exceptions for medical, religious, and other reasons as per the requirements of the role and in accordance with applicable law. Additional information will be provided to candidates about the requirements and accommodation process at the offer time based on region.

Apply Now

Date Posted

11/03/2022

Views

7

Back to Job Listings Add To Job List Company Profile View Company Reviews
Positive
Subjectivity Score: 0.8

Similar Jobs

Software Engineer Networking Software and Services - xAI

Views in the last 30 days - 0

The text describes xAIs mission to develop AI systems for understanding the universe and advancing human knowledge It outlines a role involving networ...

View Details

Associate Technical Support Engineer - Recharge

Views in the last 30 days - 0

Recharge is a subscription platform for innovative brands offering customer retention solutions They seek Technical Support roles with 247 coverage em...

View Details

Full Stack Product Engineer - Jiga

Views in the last 30 days - 0

Jiga is a remotefriendly company focused on empowering engineers with trust autonomy and flexibility They emphasize simplicity ownership and impactful...

View Details

Senior Design Manager (Infrastructure) - Canonical

Views in the last 30 days - 0

Canonical a leading opensource provider seeks a Senior Design Manager to drive innovation in cloud and AI technologies The role offers remote work glo...

View Details

Senior Product Designer - Org & Security - Typeform

Views in the last 30 days - 0

This job description outlines a role in developing an intelligent contact management system with AI capabilities The position involves designing user ...

View Details

Executive Director Patient Advocacy - Kyverna Therapeutics

Views in the last 30 days - 0

Kyverna Therapeutics is seeking an Executive Director for Patient Advocacy to lead initiatives in autoimmune disease treatment The role involves build...

View Details