A career in IBM Software means you’ll be part of a team that transforms our customer’s challenges into solutions.
Seeking new possibilities and always staying curious we are a team dedicated to creating the world’s leading AI-powered cloud-native software solutions for our customers. Our renowned legacy creates endless global opportunities for our IBMers so the door is always open for those who want to grow their career.
IBM’s product and technology landscape includes Research Software and Infrastructure. Entering this domain positions you at the heart of IBM where growth and innovation thrive.
Introduction
We're building Astra Serverless the next generation of distributed scalable fault-tolerant serverless NoSQL data services -- powered by Apache Cassandra and extended with native Vector and AI capabilities across multi-cloud environments.
Our customers depend on our platform to serve real-time mission-critical workloads on a global scale. Ensuring reliability performance and correctness under unpredictable workloads is a non-trivial challenge -- and that's where you come in.
As an engineer on the Quality Engineering and Performance team you'll develop and evolve the system-level testing frameworks that validate a distributed database-as-a-service at massive AI-driven workload scale. You'll help ensure that new features performance improvements and AI-driven extensions meet the highest standards of scalability and resilience.
Why this role?
You'll work at the intersection of distributed systems engineering and test architecture -- hands on designing and building automation and frameworks that simulate complex multi-cloud deployments chaos scenarios and performance stress conditions.
This is not QA-as-usual: you'll engineer the test systems that validate an elastic database platform capable of scaling thousands of non-uniform nodes self-healing under failure and integrating real-time vector search and analytics.
If you thrive on deep technical challenges curiosity analytical and systems thinking and building tools other engineers rely on this role will feel like home.
What You'll Help Doing
* Design and develop frameworks for end-to-end and chaos testing of distributed serverless Cassandra-based systems.
* Engineer automation that validates data correctness fault tolerance and performance under complex multi-region and multi-cloud topologies.
* Collaborate closely with feature development teams to model real-world scenarios and integrate automated validation into the delivery pipeline.
* Continuously evolve the test infrastructure for scale speed and observability -- leveraging Kubernetes Docker and cloud-native toolchains.
* Profile and tune distributed workloads to uncover systemic bottlenecks and verify that service-level goals are consistently met.
* Contribute code to shared testing frameworks and participate in design and code reviews across teams.
* Own the full cycle of quality engineering -- from test design and execution to insights and continuous improvement.
* Exposure to system level Java and Python development in testing for distributed or cloud systems -- replication partitioning consistency and eventual convergence.
* Familiarity with Linux Kubernetes Docker and CI/CD pipelines (Jenkins GitHub Actions etc.)
* Knowledge of analyzing complex logs and metrics to isolate performance and reliability issues.
* Eagerness to learn more about and using chaos testing fault injection or resilience validation.
* Familiarity with NoSQL technologies (Cassandra DynamoDB ScyllaDB etc.) and cloud platforms (AWS GCP Azure) and multi-cloud topologies.
* Understanding of vector search AI embeddings or data-intensive workloads.
* Curiosity-driven mindset strong communication skills and a focus on collaboration and craftsmanship.