Skip to content

CASSANDRA-21126: Vector search support in cassandra-easy-stress#86

Open
dracarys09 wants to merge 2 commits intoapache:mainfrom
dracarys09:vector-load-generator
Open

CASSANDRA-21126: Vector search support in cassandra-easy-stress#86
dracarys09 wants to merge 2 commits intoapache:mainfrom
dracarys09:vector-load-generator

Conversation

@dracarys09
Copy link

@dracarys09 dracarys09 commented Jan 27, 2026

Summary of the changes

  • Add VectorSearch workload for benchmarking Cassandra 5.0+ vector search (ANN) capabilities
  • Support both synthetic random vectors and realistic datasets via HDF5 files (SIFT, GloVe, etc.)
  • Implement recall@K calculation with ground truth comparison for measuring search quality
  • Add configurable similarity functions (COSINE, EUCLIDEAN, DOT_PRODUCT) and vector dimensions

Testing

  • Run ./gradlew test --tests "org.apache.cassandra.easystress.workloads.VectorSearchTest"
  • Verify workload runs against a Cassandra 5.0+ cluster with random vectors
  • Test with an HDF5 dataset (e.g., sift-128-euclidean.hdf5) and verify recall metrics are logged
  • Confirm ktlint passes: ./gradlew ktlintCheck

Comment on lines 278 to 279
val denominator = minOf(limit, relevantTruth.size).coerceAtLeast(1)
val recall = hits.toDouble() / denominator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This classifies recall as 0 if the relevantTruth.size is 0. I wonder if it makes more sense to exclude this query's recall result from the recall averages and instead track/report the number of queries that didn't have any of the ground truth results inserted into the table?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. Thanks for the review. I've updated the code to now track such queries separately to keep the recall numbers clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants