In this role you will lead initiatives to design build and optimize performance and CI automation frameworks for large-scale distributed storage systems.
You will collaborate with global teams across development QE and infrastructure to drive continuous performance improvements build intelligent CI pipelines and integrate AI-based analytics for quality engineering.
This position offers the opportunity to influence architecture decisions define benchmarking standards and shape IBM’s enterprise-grade distributed storage validation ecosystem.
-
Lead end-to-end performance engineering for distributed storage systems and CI frameworks.
-
Design and develop scalable automation tools for CI/CD benchmarking and performance analytics.
-
Build and maintain workload generators monitoring dashboards and test frameworks for large-scale environments.
-
Triage analyze and resolve performance issues across compute storage and network layers.
-
Implement AI/ML-driven insights into CI processes for predictive validation and anomaly detection.
-
Collaborate with upstream and internal teams to define KPIs metrics and performance objectives.
-
Mentor engineers on performance optimization automation and observability best practices.
-
12+ years of experience in Performance Engineering Quality Engineering or Automation Architecture.
-
Strong programming and scripting skills in Python Bash or Go.
-
Hands-on experience with distributed storage systems (Ceph GlusterFS MinIO or similar).
-
Deep understanding of Linux internals networking and storage I/O performance.
-
Proficiency in CI/CD frameworks (Jenkins GitLab CI or similar).
-
Experience with cloud infrastructure platforms (OpenStack OpenShift or Kubernetes).
-
Familiarity with performance and workload tools (FIO COSBench vdbench or equivalent).
-
Experience with infrastructure automation (Ansible Terraform).
-
Strong problem-solving and analytical skills across distributed environments.
-
Experience integrating AI/ML-driven frameworks for CI analytics or performance insights.
-
Knowledge of monitoring and observability tools (Grafana Prometheus ELK stack).
-
Experience in container performance tuning and Kubernetes-based workloads.
-
Understanding of cloud networking and distributed storage architectures.
-
Contributions to open-source or distributed systems communities