Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manual compaction for segments #225

Open
hicder opened this issue Dec 26, 2024 · 0 comments
Open

Manual compaction for segments #225

hicder opened this issue Dec 26, 2024 · 0 comments
Assignees
Labels
good first issue Good for newcomers

Comments

@hicder
Copy link
Owner

hicder commented Dec 26, 2024

Problem

Right now, if a collection have multiple segments, we will go through each segment and do a search on it. Link. That means, if we have a lot of segments, query latency will suffer.

This task merges the segments if we have too many.

Detail

This will go and merge segments based on some criteria

  • We will first inquire the collection to get a list of segments
  • Users can then choose which segments to merge
  • In the future, we will extend this to having a background thread to do this automatically, but for now, let's just focus on having users doing the manual compaction.

This task will contain:

  • Have a gRPC to list what segments a collection has
  • Have a gRPC to send a compact command with a list of segments
  • These gRPC will have a corresponding API in the Collection struct
  • Implement the logic to merge 2 segments (you'll need to familiarize yourself with builders and writers)
  • Implement the logic to swap segments atomically

You'll need to understand the locking mechanism to make sure it doesn't have data race.

@hicder hicder added the good first issue Good for newcomers label Jan 15, 2025
@hicder hicder changed the title Compactor segment engine Manual compaction for segments Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants