Skip to content

[ENH]: wire GC v2 to new cleanup modes & call FinishDatabaseDeletion from garbage collector #4671

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 30, 2025

Conversation

codetheweb
Copy link
Contributor

@codetheweb codetheweb commented May 29, 2025

Description of changes

Changes:

  • Adds two new cleanup modes, DryRunV2 and DeleteV2. When the default cleanup mode is set to one of these, or when it is set per tenant, the v2 garbage collector orchestrator will be used. Otherwise, the old orchestrator will be used. The default remains the same (the old orchestrator in dry run mode).
  • Adds a config field to the garbage collector config for the root manager cache config. Has a default.
  • Calls FinishDatabaseDeletion at the end of each GC cycle. This will transition eligible soft deleted databases -> hard deleted databases. I added a test for this.

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

n/a

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@codetheweb codetheweb force-pushed the feat-call-database-hard-delete-from-gc branch from 3f44d7a to b021664 Compare May 29, 2025 19:20
@codetheweb codetheweb changed the title [ENH]: call FinishDatabaseDeletion from garbage collector [ENH]: wire GC v2 to new cleanup modes & call FinishDatabaseDeletion from garbage collector May 29, 2025
@codetheweb codetheweb force-pushed the feat-gc-soft-delete-database branch from a897895 to 89363e7 Compare May 29, 2025 21:12
@codetheweb codetheweb force-pushed the feat-call-database-hard-delete-from-gc branch from f93cb18 to 0184cd6 Compare May 29, 2025 21:12
@codetheweb codetheweb marked this pull request as ready for review May 29, 2025 21:17
Copy link
Contributor

propel-code-bot bot commented May 29, 2025

Garbage Collector V2 Modes, Soft/Hard Delete for Databases, and Per-Tenant Cleanup Control

This PR introduces two new garbage collection modes (DryRunV2 and DeleteV2) that activate a new v2 orchestrator for collection and file cleanup in selected tenants, while the default remains the classic dry-run. It also refactors database deletion to utilize soft delete followed by a new system-wide FinishDatabaseDeletion operation (rpc) triggered post-garbage collection, enabling safe hard deletion of databases that are both soft deleted and have no collections. The changes span config structures, orchestrator switching logic, sysdb gRPC/api, the GC workflow, and integration tests covering the new database deletion flow.

Key Changes:
• Added DryRunV2 and DeleteV2 CleanupModes to select new garbage collector orchestrator logic per default or per-tenant config.
• Introduced v2 orchestrator wiring in Rust GC, switching based on mode.
• Converted database deletion to a soft/hard model (first mark as deleted, then later hard-delete eligible DBs if no remaining collections).
• Added a new FinishDatabaseDeletion gRPC service and implementations across Go coordinator, metastore, and Rust sysdb-including plumbing through proto, dao, and service layers.
• Extended config (Rust GC) with a root cache section and default.
• Significant test additions in Rust (integration, property, orchestrator) as well as upgrades to Go service/database/collection tests to cover the new lifecycle and correctness.

Affected Areas:
• Garbage Collector (Rust): orchestrators, config, modes, workflow, tests
• SysDB: gRPC proto, Rust client/server, Go coordinator/dao
• Database and collection soft/hard deletion logic
• Testing: Rust property/integration tests, Go service tests

Potential Impact:

Functionality: Enables per-tenant and system-wide selection of new GC logic, changes DB deletion semantics (no longer immediately hard-delete), and ensures databases are only finally removed once all collections are gone and past cutoff.

Performance: No direct regression expected-v2 orchestrator may have efficiency differences for complex fork-graphs. Soft/hard delete pattern minimizes DB contention during deletions.

Security: No security issues introduced; hard deletes are only performed on eligible databases.

Scalability: Prepares the system for larger/multi-tenant topologies by decoupling database cleanup from immediate collection drop actions.

Review Focus:
• Switching logic for orchestrator type and per-tenant GC mode
• Correctness and idempotency of new soft/hard DB deletion workflow (across Go and Rust)
• Enum/proto compatibility for new APIs and their propagation to all consumers
• Completeness of updated/integrated tests
• Potential for regressions in complex tenant/version/fork scenarios

Testing Needed

• Verify GC v1/v2 path selection by adjusting default and per-tenant config and observing orchestrator
• Trigger soft/hard DB deletes and confirm databases are only hard deleted once collections are gone and cutoff expired
• Run end-to-end and property-based tests covering collection, version file, sparse index, and file deletion flows
• Check multi-tenant GC and proper cache setup via config

Code Quality Assessment

all test files: Thorough unit, integration, and property-based tests added or extended; edge cases well-covered

rust/garbage_collector/src/garbage_collector_component.rs: Well-structured and extensively modified; handles orchestrator switching and FinishDatabaseDeletion invocation

go/pkg/sysdb/metastore/db/dao/database.go: Soft/hard deletion implementation is clear; proper use of transactions and new method for eligibility check

idl/chromadb/proto/coordinator.proto: Proto extended in a backward-compatible manner; enums and new messages for finish logic are correct

rust/garbage_collector/src/config.rs: Config upgrade backwards compatible, with new field correctly defaulted

rust/garbage_collector/src/types.rs: CleanupMode extension and new response struct properly scoped

Best Practices

Error Handling:
• Consistent transaction handling and error bubbling across languages

API Versioning:
• Extends proto and service logic non-breaking, uses new RPC for new semantics

Config Management:
• Backward-compatible config upgrade, defaults preserved

Testing:
• Comprehensive property and integration testing for all major flows, includes E2E and property-based approaches

Potential Issues

• If a tenant's per-tenant cleanup mode is misconfigured, collections may not be cleaned as intended
• Potential subtle migration issues if upgrading from a system using only hard deletes (requires careful test validation)
• Concurrency edge cases in soft/hard DB deletion must be observed under high load or cascades
• Complex collection fork trees may stress v2 orchestrator and soft-deleted DB detection

This summary was automatically generated by @propel-code-bot

@codetheweb codetheweb force-pushed the feat-gc-soft-delete-database branch from 89363e7 to 5e896ee Compare May 29, 2025 21:43
@codetheweb codetheweb force-pushed the feat-call-database-hard-delete-from-gc branch from 0184cd6 to 7922809 Compare May 29, 2025 21:43
@codetheweb codetheweb force-pushed the feat-gc-soft-delete-database branch 2 times, most recently from 05cf039 to 52d3755 Compare May 29, 2025 22:39
@codetheweb codetheweb changed the base branch from feat-gc-soft-delete-database to graphite-base/4671 May 29, 2025 23:24
Copy link

graphite-app bot commented May 29, 2025

Merge activity

  • May 29, 11:27 PM UTC: This pull request can not be added to the Graphite merge queue. Please try rebasing and resubmitting to merge when ready.
  • May 29, 11:27 PM UTC: Graphite disabled "merge when ready" on this PR due to: a merge conflict with the target branch; resolve the conflict and try again..
  • May 29, 11:27 PM UTC: This pull request can not be added to the Graphite merge queue. Please try rebasing and resubmitting to merge when ready.
  • May 29, 11:27 PM UTC: Graphite disabled "merge when ready" on this PR due to: a merge conflict with the target branch; resolve the conflict and try again..

@codetheweb codetheweb force-pushed the feat-call-database-hard-delete-from-gc branch from 7922809 to f5dbbcd Compare May 29, 2025 23:34
@codetheweb codetheweb changed the base branch from graphite-base/4671 to main May 29, 2025 23:34
@codetheweb codetheweb merged commit 3b76d4d into main May 30, 2025
72 checks passed
@codetheweb codetheweb deleted the feat-call-database-hard-delete-from-gc branch May 30, 2025 02:15
Inventrohyder pushed a commit to Inventrohyder/chroma that referenced this pull request Aug 5, 2025
…from garbage collector (chroma-core#4671)

## Description of changes

Changes:

- Adds two new cleanup modes, `DryRunV2` and `DeleteV2`. When the
default cleanup mode is set to one of these, or when it is set per
tenant, the v2 garbage collector orchestrator will be used. Otherwise,
the old orchestrator will be used. The default remains the same (the old
orchestrator in dry run mode).
- Adds a config field to the garbage collector config for the root
manager cache config. Has a default.
- Calls `FinishDatabaseDeletion` at the end of each GC cycle. This will
transition eligible soft deleted databases -> hard deleted databases. I
added a test for this.

## Test plan

_How are these changes tested?_

- [x] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust

## Documentation Changes

_Are all docstrings for user-facing APIs updated if required? Do we need
to make documentation changes in the [docs
section](https://github.com/chroma-core/chroma/tree/main/docs/docs.trychroma.com)?_

n/a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants