Site Reliability Engineer II - Apptio

IBM · US Bellevue

Company

IBM

Location

US Bellevue

Type

Full Time

Job Description

Introduction
At IBM work is more than a job – it’s a calling: To build. To design. To code. To consult. To think along with clients and sell. To make markets. To invent. To collaborate. Not just to do something better but to attempt things you’ve never thought possible. Are you ready to lead in this new era of technology and solve some of the world’s most challenging problems? If so lets talk.

Your Role and Responsibilities
You:
You are passionate about observability automation and reliability. Your team can count on you to deliver creative and inventive solutions to hard problems. You are comfortable working with developers senior leadership and non-technical individuals to help provide value to the broader organization. You take opportunities to fix problems mentor your peers and step outside your comfort zone to develop your skillset.
Us:
Apptio Targetprocess empowers businesses to adopt and scale agile across the enterprise. We develop Agile tool that connects teams products and portfolios to business objectives using SAFe LeSS and other Agile frameworks. In the 2021 Gartner Magic Quadrant for Enterprise Agile Planning Tools report Apptio’s recently acquired Targetprocess has been recognized as a “Leader”.
SRE Team:
Apptio Targetprocess SRE team’s main responsibility is to make sure that company’s infrastructure and applications runs in a smooth and stable manner. We count on our site reliability engineers (SREs) to empower our users with a rich feature set high availability and stellar performance level to pursue their missions. That mostly means work proactively on system’s reliability preventing any kind of outages observing and keeping an eye on the key metrics taking urgent mitigation measures when needed assisting other teams on infrastructure-related topics.
On a typical day in this role you will interact with Kubernetes Docker Helm Elasticsearch DataDog Grafana Sensu Puppet Ansible/AWX AWS Azure Python/Bash/PowerShell Terraform/Terragrunt. If you don’t know all these tools don’t worry we are not expecting that you know them all we understand that technology evolves quickly.

Major Responsibilities:

  • Scale systems sustainably through mechanisms like automation
  • Ownership of monitoring system
  • Maintain services in production by measuring and monitoring availability latency and overall system health.
  • Application expansion and horizontal scaling.
  • Work closely with developers support and QA teams on maintaining and improving the whole lifecycle of services.
  • Practice sustainable incident response and blameless post-mortems.
  • Provide primary operational support and engineering for multiple large distributed software applications.


Required Technical and Professional Expertise

  • Familiarity with Site-Reliability Engineering
  • The ability to thrive in Autonomy
  • Knowledge of configuration management tools (e.g. Ansible or Puppet)
  • Experience with any scripting language (Bash Python PowerShell etc.)
  • Experience with containerization (e.g. Docker Podman etc.)
  • Experience with container orchestration tools (e.g. Kubernetes Open Shift Docker Swarm etc.)
  • Experience with database administration and management (MS SQL Server PostgreSQL MongoDB)
  • Familiarity with public cloud providers such as AWS Azure or IBM Cloud
  • Experience with monitoring observability & logging (e.g. DataDog Prometheus Grafana ELK stack Loki etc.)
  • Familiarity with RESTful systems and their APIs
  • Experience with high-level programming languages (Golang .Net Java etc.) is a plus
  • Mentoring peers and sharing skills


Preferred Technical and Professional Expertise

  • Ability to thrive in autonomy
  • Experience in a large-scale distributed Linux/Unix or Windows is a plus
  • Mentoring peers and sharing skills
  • Great communication skills
Apply Now

Date Posted

09/17/2024

Views

0

Back to Job Listings Add To Job List Company Profile View Company Reviews
Positive
Subjectivity Score: 0.8

Similar Jobs

Site Reliability Engineer II - IBM

Views in the last 30 days - 0

The role is for a team member to develop and support the Apptio Kubernetes Platform interacting with various tools and collaborating with other teams ...

View Details

Senior Site Reliability Engineer - IBM

Views in the last 30 days - 0

The role is for a Site Reliability Engineer to develop and support the Apptio Kubernetes Platform requiring experience in SRE problemsolving and colla...

View Details

AI/ML Staff Software Development Engineer - Apptio - IBM

Views in the last 30 days - 0

The job posting is for a Staff AIMLOps Development Engineer at Apptio responsible for designing and engineering efficient and resilient MLOps platform...

View Details

Staff Backend Software Development Engineer - Apptio - IBM

Views in the last 30 days - 0

The job posting is looking for a seasoned software engineer with experience in building scalable microservices and handling massive amounts of data Th...

View Details

Apptio - Software Development Engineer I - IBM

Views in the last 30 days - 0

The text describes a job opportunity at IBM highlighting the companys focus on innovation collaboration and delivering elegant solutions to complex bu...

View Details

Senior Software Development Engineer, Apptio - IBM

Views in the last 30 days - 0

The job posting is for a senior software engineer position at Apptio an IBM company The role involves working on a highperforming crossfunctional team...

View Details