L3 Cloud DevOps Engineer / Site Reliability Engineer (SRE)

NTD Software · Remote

Company

NTD Software

Location

Remote

Type

Full Time

Job Description

We are seeking an experienced L3 Cloud DevOps Engineer with a strong focus on Site Reliability Engineering (SRE) to join our team. This role is centered around the creation and enhancement of monitoring and alerting tools, with significant emphasis on using Grafana, Prometheus, and Datadog. The ideal candidate will have hands-on experience with Python scripting and a solid understanding of user and system monitoring. This role involves proactive dashboard building, cross-functional collaboration, and addressing service issues through monitoring and remediation.

Requirements

  • Extensive hands-on experience with Python scripting.
  • Strong expertise in Site Reliability Engineering (SRE) practices.
  • Proficiency in Grafana, including dashboard creation and modification.
  • In-depth knowledge of Prometheus and Datadog tools for monitoring and alerting.
  • Experience with user and system monitoring, along with the ability to create and enhance dashboards and runbooks.
  • DevOps experience is a secondary but desirable skill set.
  • Relevant certifications or courses in Python, SRE, Grafana, and Prometheus are a plus.

Responsibilities

  • Proactively build and enhance Grafana dashboards to improve monitoring capabilities.
  • Collaborate with cross-functional teams to ensure effective monitoring and alerting.
  • Manage and respond to alerts, focusing on timely remediation and implementation of solutions for service issues.
  • Conduct user and system monitoring to identify and address potential problems.
  • Develop and maintain runbooks to support operational efficiency and incident response.
  • Utilize Python scripting to automate and improve processes within the DevOps and SRE framework.
Apply Now

Date Posted

09/14/2024

Views

2

Back to Job Listings Add To Job List Company Profile View Company Reviews
Positive
Subjectivity Score: 0.8

Similar Jobs

Software Engineer Networking Software and Services - xAI

Views in the last 30 days - 0

The text describes xAIs mission to develop AI systems for understanding the universe and advancing human knowledge It outlines a role involving networ...

View Details

Principal Cloud Architect: Pre-Sales - Myriad360

Views in the last 30 days - 0

This job description outlines a senior cloud architect role requiring Azure and GCP expertise focusing on secure cloud solutions The company emphasize...

View Details

Associate Technical Support Engineer - Recharge

Views in the last 30 days - 0

Recharge is a subscription platform for innovative brands offering customer retention solutions They seek Technical Support roles with 247 coverage em...

View Details

Full Stack Product Engineer - Jiga

Views in the last 30 days - 0

Jiga is a remotefriendly company focused on empowering engineers with trust autonomy and flexibility They emphasize simplicity ownership and impactful...

View Details

Senior Design Manager (Infrastructure) - Canonical

Views in the last 30 days - 0

Canonical a leading opensource provider seeks a Senior Design Manager to drive innovation in cloud and AI technologies The role offers remote work glo...

View Details

Senior Product Designer - Org & Security - Typeform

Views in the last 30 days - 0

This job description outlines a role in developing an intelligent contact management system with AI capabilities The position involves designing user ...

View Details