SRE Architect

IBM • IN Bangalore

Company

IBM

Location

IN Bangalore

Type

Full Time

Job Description

Introduction
A career in IBM Software means you’ll be part of a team that transforms our customer’s challenges into solutions.
Seeking new possibilities and always staying curious we are a team dedicated to creating the world’s leading AI-powered cloud-native software solutions for our customers. Our renowned legacy creates endless global opportunities for our IBMers so the door is always open for those who want to grow their career.
IBM’s product and technology landscape includes Research Software and Infrastructure. Entering this domain positions you at the heart of IBM where growth and innovation thrive.

Your Role and Responsibilities
As an Architect for Site Reliability Engineering the focus is to ensure that the designed solution responds to non-functional requirements such as reliability availability performance security and maintainability. You will closely work with the development and other related Release and extended support teams.
  • You will bring a strong engineering focus to operations putting your leadership to identify methods for preventing incidents increasing observability automation frameworks self-service infrastructure logging and metrics and operational reports.
  • You will be expected to use tools include logging monitoring event management notification Runbook Automation ChatOps Root Cause Analysis.
  • You will work with Automation Engineers and QA Engineers development team to ensure seamless delivery of our service offerings.
  • Build sufficient expertise in the IBM Cloud control plane to create automated monitoring processes

In this role you will lead the problem resolution process for our clients from analysis and troubleshooting to deploying the latest software updates & fixes.

Your primary responsibilities include:

  • 24Ă—7 Observability : Be part of a worldwide team that monitors the health of production systems and services around the clock ensuring continuous reliability and optimal customer experience.
  • Cross-Functional Troubleshooting : Collaborate with engineering teams to provide initial assessments and possible workarounds for production issues. Troubleshoot and resolve production issues effectively.
  • Deployment and Configuration : Leverage Continuous Delivery (CI/CD) tools to deploy services and configuration changes at enterprise scale.
  • Security and Compliance Implementation : Implementing security measures that meet or exceed industry standards for regulations such as GDPR SOC2 ISO 27001 PCI HIPAA and FBA.
  • Maintenance and Support
  • Keeping your assigned site or service up and running or getting it back up and running quickly when failure occurs
  • Working closely with internal partners and teams to ensure that our infrastructure meets security SLA and performance requirements
  • Writing updating and using documentation including runbooks/playbooks
  • Automating work including infrastructure needs testing failover solutions failure mitigation and much more
  • Debugging complex problems across an entire stack and creating solid solutions
  • Developing CI/CD processes to improve cadence
  • Persistent testing of application and infrastructure resiliency over a variety of error conditions.
  • Partnering with security engineers and developing plans and automation to aggressively and safely respond to new risks and vulnerabilities.
  • Develop communicate and monitor standard processes to promote the long-term health of sustainability and health of operational development tasks.
  • Standup and maintain pre-production and developer environments to support the entire development organization and improve overall team velocity
  • Use metrics and analytics to determine reliability issues and remove them through automation and tooling
  • Be an advocate for our customers providing them self-diagnosing tools to resolve common issues that arise in the field


Required Technical and Professional Expertise

  • 10+ yrs of SRE/Level 3 support experience
  • A solid understanding of Cloud infrastructure/operations
  • Expertise on Linux internals
  • Experience debugging complex problems
  • Experience designing building and operating large-scale production systems
  • Expertise in Ansible Bash core Python development
  • Strong familiarity with one of C C++ golang Python or Java
  • Experience with containers such as with Docker Kubernetes
  • Experience with standard industry tools for monitoring and observability
  • Experience automating infrastructure configuration management testing and deployments using tools like Ansible Chef and can explain the Infrastructure as Code paradigm
  • A strong understanding of diverse infrastructure platforms and infrastructure concepts required.
  • Has hands-on experience using source control and feature branching strategies
  • Understands networking and messaging especially between services
  • Must have good experience in Infrastructure Operations automation and IT Service Management with hands on exposure in data center administration configuration Incident management and support
  • Strong communication skills


Preferred Technical and Professional Expertise

  • IBM Cloud API knowledge
  • Behavior Driven Development
  • Experience in Software Development Life Cycle Test Driven Development Continuous Integration and Continuous Delivery
  • Familiarity with cloud deployment tooling such as razee and launch darkly
Apply Now

Date Posted

11/21/2024

Views

0

Back to Job Listings ❤️Add To Job List Company Info View Company Reviews
Positive
Subjectivity Score: 0.8

Similar Jobs

Technical Lead - Aleph Beta

Views in the last 30 days - 0

Eitan the current Technical Lead at Aleph Beta is stepping down due to personal reasons He is taking responsibility to find his successor actively sea...

View Details

Data Engineer (GIS experience preferred) - Novabyte Solutions

Views in the last 30 days - 0

The job posting is for a remote Data Engineer position The successful candidate will develop and optimize data systems initially focusing on geospatia...

View Details

Business Architect - Onebridge

Views in the last 30 days - 0

Onebridge a highlyrated consulting firm is seeking a skilled Business Architect The role involves bridging the gap between business goals and technica...

View Details

AWS Solution Architect - Onebridge

Views in the last 30 days - 0

Onebridge a Marlabs Company is seeking an experienced AWS Solution Architect to design and implement innovative cloud solutions The role involves arch...

View Details

ESB Integration Engineer - Kyndryl

Views in the last 30 days - 0

Kyndryl is a company that designs builds manages and modernizes missioncritical technology systems They are committed to creating a more equitable inc...

View Details

Solution Architect L3 - Wipro

Views in the last 30 days - 0

The role involves creating exceptional architectural solution designs and thought leadership enabling delivery teams to provide exceptional client eng...

View Details