We’re building Astra Serverless the next generation of distributed scalable fault-tolerant serverless NoSQL data services — powered by Apache Cassandra and extended with native Vector and AI capabilities across multi-cloud environments.
Our customers depend on our platform to serve real-time mission-critical workloads on a global scale. Ensuring reliability performance and correctness under unpredictable workloads is a non-trivial challenge — and that’s where you come in.
As an engineer on the Quality Engineering and Performance team you’ll develop and evolve the system-level testing frameworks that validate a distributed database-as-a-service at massive AI-driven workload scale. You’ll help ensure that new features performance improvements and AI-driven extensions meet the highest standards of scalability and resilience.
Why this role?
You’ll work at the intersection of distributed systems engineering and test architecture — hands on designing and building automation and frameworks that simulate complex multi-cloud deployments chaos scenarios and performance stress conditions.
This is not QA-as-usual: you’ll engineer the test systems that validate an elastic database platform capable of scaling thousands of non-uniform nodes self-healing under failure and integrating real-time vector search and analytics.
If you thrive on deep technical challenges curiosity analytical and systems thinking and building tools other engineers rely on this role will feel like home.
- Design and develop frameworks for end-to-end and chaos testing of distributed serverless Cassandra-based systems.
- Engineer automation that validates data correctness fault tolerance and performance under complex multi-region and -cloud topologies.
- Collaborate closely with your peers in local and remote feature development teams to model real-world scenarios and integrate automated validation into the delivery pipeline.
- Continuously evolve the test infrastructure for scale speed and observability — leveraging Kubernetes Docker and cloud-native toolchains.
- Profile and tune distributed workloads to uncover systemic bottlenecks and verify that service-level goals are consistently met.
- Contribute code to shared testing frameworks and participate in design and code reviews across teams.
- Own the full cycle of quality engineering — from test design and execution to insights and continuous improvement.
- Exposure to system level Java and Python development in testing for distributed or cloud systems — replication partitioning consistency and eventual convergence.
- Eagerness to learn more about and using chaos testing fault injection or resilience validation.
- Knowledge of analyzing complex logs and metrics to isolate performance and reliability issues.
-
Familiarity with Linux Kubernetes Docker and CI/CD pipelines (Jenkins GitHub Actions etc.).
- Familiarity with NoSQL technologies (Cassandra DynamoDB ScyllaDB etc.) and  cloud platforms (AWS GCP Azure) and multi-cloud topologies.
- Curiosity-driven mindset strong communication skills and a focus on collaboration and craftsmanship.
- Understanding of vector search AI embeddings or data-intensive workloads.