Observability Automation & Integration Lead Engineer

Iron Mountain · Remote

Company

Iron Mountain

Location

Remote

Type

Full Time

Job Description

At Iron Mountain we know that work, when done well, makes a positive impact for our customers, our employees, and our planet. Thatโ€™s why we need smart, committed people to join us. Whether youโ€™re looking to start your career or make a change, talk to us and see how you can elevate the power of your work at Iron Mountain.

We provide expert, sustainable solutions in records and information management, digital transformation services, data centers, asset lifecycle management, and fine art storage, handling, and logistics. We proudly partner every day with our 225,000 customers around the world to preserve their invaluable artifacts, extract more from their inventory, and protect their data privacy in innovative and socially responsible ways.ย 

Are you curious about being part of our growth storโ€‹y while evolving your skills in a culture that will welcome your unique contributions? If so, let's start the conversation.

Job Summary

We are actively seeking a proactive and skilled Senior Engineer specializing in Observability, Monitoring, and Automation to join our dynamic team. In this critical role, you will be responsible for implementing, managing, and enhancing observability platforms to ensure optimal network and application performance. Your responsibilities will include configuring alerts and dashboards, end-to-end automation, and conducting data trend analysis.ย 

This position is ideal for individuals with a strong background in observability platform engineering, network and application performance monitoring, and automation, who are eager to leverage their technical expertise in a collaborative and fast-paced environment. Experience with observability platforms and tools like Datadog and SolarWinds is a plus. If you are passionate about building scalable systems, enhancing observability, and working with cross-functional teams, and are committed to delivering high-quality solutions, this could be the perfect opportunity for you.

ย 

Core experience/responsibilities

  • Monitoring Platform Engineering

    • 10+ years of experience with platforms such as SolarWinds, Datadog, HP Openview, BMC, etc.

    • 10+ years of experience in network, application performance, and synthetic monitoring.

    • Expertise in configuring alerts, creating dashboards, and conducting data trend analysis.

    • Experience in automating the detection of missing assets and configuring them into the monitoring ecosystem via REST API/scripting.

    • Proficiency in monitoring various end devices including routers, switches, firewalls, storage, virtual, Windows servers, Linux servers, and UNIX servers.

    • 8+ years of experience automating infrastructure operations using tools like Ansible and Python for event correlation.

    • Expertise in integrating monitoring data with other platforms such as CMDB/ServiceNow.

    • Experience configuring monitors using SNMP, SSH, WinRM, WMI, JMX, etc.

    • Ability to design and implement highly available continuous monitoring platforms for 24x7 operations.

  • Technical Solutions and Collaboration

    • Recommend baseline monitoring thresholds, KPIs, and SLAs.

    • Provide solutions to complex problems and drive process improvements.

    • Experience with both on-premise and cloud environments.

    • Expertise in advanced troubleshooting and root cause analysis.

    • Proficiency with platforms like ServiceNow, Remedy, or Assyst.

    • Identify automation opportunities and implement proactive monitoring solutions.

    • Work effectively with Enterprise Architects, OS engineers, and operations support teams to provide training, develop guidelines, and serve as a subject matter expert.

  • Design and Implementation

    • Drive enterprise tools and automation implementations while holding stakeholders accountable for their responsibilities and deliverables.

    • Participate in technical design discussions, considering trade-offs to support business value, scalability, and delivery timelines.

    • Ensure adherence to architectural governance and security standards.

    • Contribute to the design and architecture of high-performance, scalable systems, ensuring they meet business requirements and are cost-effective.

    • Create and maintain detailed design documentation, including diagrams, technical specifications, and architecture blueprints.

    • Design systems with a focus on performance optimization, ensuring minimal latency and high throughput.

    • Develop strategies to ensure system scalability, accommodating future growth and changes in workload.

    • Integrate security best practices into the design and implementation of systems, ensuring robust protection against threats.

    • Evaluate new technologies and tools, recommending their integration into the development process to enhance productivity and system capabilities.

  • Process/Operational Experience

    • Plan and execute system and software installations, upgrades, and changes across the organization.

    • Understand various methodologies such as Agile, Scrum, and manage project objectives, delivery approaches, and plans.

    • Identify and mitigate risks throughout projects and tasks, addressing major design flaws.

    • Experience gathering and organizing large amounts of data for instrumentation into an enterprise monitoring solution.

    • Share knowledge of monitoring best practices with system owners and administrators to enhance overall monitoring and alerting posture.

Operational requirements

  • Available for on-call support outside of normal business hours to address critical issues.

  • Strong communication skills to relate technical details to non-technical leaders and users.

  • Promote a positive working environment, encourage teamwork, and mentor rising talent.

  • Excellent time management and organizational skills, with experience establishing guidelines for others.

  • Ability to notice differences and issues as they arise and escalate them to management.

  • Facilitate discussions and explore alternative approaches to resolve conflicts.

  • Take personal accountability for decision-making and collaborating with cross-functional teams.

  • Working expertise in infrastructure/application log aggregation ingested into a security

  • Experience with log aggregation tools such as ELK, Logstash, Kibana, Splunk, or QRadar.

  • Proficiency in Ansible and Python, with the ability to create complex SQL queries for reporting and correlation.

  • Bachelor's degree in Computer Science, Information Technology, or a related field is required.

Nice to Have

  • Working expertise in infrastructure/application log aggregation ingested into a security

  • Experience with log aggregation tools such as ELK, Logstash, Kibana, Splunk, or QRadar.

  • Proficiency in Ansible and Python, with the ability to create complex SQL queries for reporting and correlation.

Education

  • Bachelor's degree in Computer Science, Information Technology, or a related field or equivalent experience is required.

#LI-Remote

Category: Information Technology

Apply Now

Date Posted

09/06/2024

Views

0

Back to Job Listings Add To Job List Company Profile View Company Reviews
Positive
Subjectivity Score: 0.8

Similar Jobs

Software Engineer Networking Software and Services - xAI

Views in the last 30 days - 0

The text describes xAIs mission to develop AI systems for understanding the universe and advancing human knowledge It outlines a role involving networ...

View Details

Associate Technical Support Engineer - Recharge

Views in the last 30 days - 0

Recharge is a subscription platform for innovative brands offering customer retention solutions They seek Technical Support roles with 247 coverage em...

View Details

Full Stack Product Engineer - Jiga

Views in the last 30 days - 0

Jiga is a remotefriendly company focused on empowering engineers with trust autonomy and flexibility They emphasize simplicity ownership and impactful...

View Details

Senior Design Manager (Infrastructure) - Canonical

Views in the last 30 days - 0

Canonical a leading opensource provider seeks a Senior Design Manager to drive innovation in cloud and AI technologies The role offers remote work glo...

View Details

Senior Product Designer - Org & Security - Typeform

Views in the last 30 days - 0

This job description outlines a role in developing an intelligent contact management system with AI capabilities The position involves designing user ...

View Details

Executive Director Patient Advocacy - Kyverna Therapeutics

Views in the last 30 days - 0

Kyverna Therapeutics is seeking an Executive Director for Patient Advocacy to lead initiatives in autoimmune disease treatment The role involves build...

View Details