-
Notifications
You must be signed in to change notification settings - Fork 7
Fix race condition creating orphaned uncollected Collections #959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix race condition creating orphaned uncollected Collections #959
Conversation
The RefreshUncollectedWorksCollection service had a race condition where concurrent calls for the same authority could create orphaned Collections: 1. Both threads create Collection objects 2. Both save Collections to database 3. Only one links Collection to Authority 4. Result: Orphaned Collection not linked to any Authority Fixed by: - Adding Authority.transaction wrapper for atomicity - Using pessimistic locking with Authority.lock.find() - Re-checking for existing collection inside lock - Saving collection and linking to authority within transaction Also added: - Concurrency tests using threads to verify race condition is fixed - Cleanup rake task to fix existing orphaned collections: * Dry-run mode by default (safe) * Execute mode: rake cleanup_orphaned_uncollected_collections[execute] * Links orphans to correct Authority when identifiable * Deletes empty orphans and unfixable duplicates - Comprehensive tests for rake task (13 examples) All tests pass: 1668 examples, 0 failures 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request fixes a race condition in the RefreshUncollectedWorksCollection service that was creating orphaned Collections. The fix introduces pessimistic locking and transactions, adds comprehensive concurrency tests, and provides a rake task to clean up existing orphaned collections.
Changes:
- Added transaction wrapper and pessimistic locking to
RefreshUncollectedWorksCollectionservice to prevent concurrent execution issues - Implemented concurrency tests with multiple threads to verify the race condition is resolved
- Created a rake task with dry-run and execute modes to identify and clean up orphaned uncollected collections
- Added comprehensive test coverage for the rake task covering all cleanup scenarios
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
app/services/refresh_uncollected_works_collection.rb |
Added transaction and pessimistic locking to prevent race conditions when creating/updating uncollected works collections |
spec/services/refresh_uncollected_works_collection_spec.rb |
Added concurrency tests with 2 and 3 threads to verify only one collection is created and no orphans remain |
lib/tasks/cleanup_orphaned_uncollected_collections.rake |
New rake task to identify and fix/delete orphaned collections, with dry-run and execute modes |
spec/lib/tasks/cleanup_orphaned_uncollected_collections_rake_spec.rb |
Comprehensive tests for the rake task covering all scenarios: linking, deleting empty/duplicate/unfixable collections |
| # Disable automatic manifestations_count updates during bulk add | ||
| collection.skip_manifestations_count_update = true | ||
|
|
||
| nextseqno = (collection.collection_items.maximum(:seqno) || 0) + 1 |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When creating a new collection (is_new_collection == true), calling collection.collection_items.maximum(:seqno) on line 31 may cause issues because the collection hasn't been saved to the database yet. For new collections, this should return 0 directly without querying. Consider checking is_new_collection or collection.new_record? before calling .maximum(:seqno) to avoid potential database errors.
| nextseqno = (collection.collection_items.maximum(:seqno) || 0) + 1 | |
| nextseqno = | |
| if is_new_collection || collection.new_record? | |
| 1 | |
| else | |
| (collection.collection_items.maximum(:seqno) || 0) + 1 | |
| end |
| elsif locked_authority.changed? | ||
| locked_authority.save! |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The elsif locked_authority.changed? check on line 55 may be unnecessary. If is_new_collection is false, the collection already exists and is linked to the authority. The only potential change to locked_authority would be if remove_collected_works modifies it, but that method only modifies the collection's items, not the authority itself. Consider removing this condition or adding a comment explaining when the authority would be changed in the else branch.
| elsif locked_authority.changed? | |
| locked_authority.save! |
| elsif potential_authority && potential_authority.uncollected_works_collection.present? | ||
| # Authority already has a collection | ||
| collection_id = potential_authority.uncollected_works_collection_id | ||
| puts " ⚠ Authority #{potential_authority.id} already has an uncollected collection (ID: #{collection_id})" | ||
| puts ' → Will delete this orphaned duplicate' | ||
| if execute_mode | ||
| collection.destroy! | ||
| stats[:deleted_unfixable] += 1 | ||
| puts ' ✓ Deleted duplicate collection' | ||
| else | ||
| puts ' [DRY-RUN] Would delete duplicate collection' | ||
| end |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate collections (lines 55-66) are being counted as 'deleted_unfixable' rather than having their own counter like 'deleted_duplicate'. This makes the statistics less accurate and could be confusing in the summary output. Consider adding a separate counter for deleted duplicates to make the reporting clearer and more accurate.
| work_authority_ids = InvolvedAuthority.where(item_id: manifestation.expression.work_id, item_type: 'Work') | ||
| .where(role: %i(author illustrator)) | ||
| .pluck(:authority_id) |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code assumes manifestation.expression and manifestation.expression.work are always present (lines 126, 129). If a manifestation doesn't have an expression or if the expression doesn't have a work, this will raise a NoMethodError. Consider adding nil checks or using safe navigation operator (&.) before accessing work_id and expression_id.
| work_authority_ids = InvolvedAuthority.where(item_id: manifestation.expression.work_id, item_type: 'Work') | |
| .where(role: %i(author illustrator)) | |
| .pluck(:authority_id) | |
| expression = manifestation.expression | |
| work_id = expression&.work_id | |
| work_authority_ids = if work_id.present? | |
| InvolvedAuthority.where(item_id: work_id, item_type: 'Work') | |
| .where(role: %i(author illustrator)) | |
| .pluck(:authority_id) | |
| else | |
| [] | |
| end |
| # Get all authorities from collection items by examining manifestations | ||
| authority_ids = [] | ||
|
|
||
| collection.collection_items.where(item_type: 'Manifestation').includes(item: { expression: :work }).find_each do |ci| | ||
| next if ci.item.blank? | ||
|
|
||
| manifestation = ci.item | ||
|
|
||
| # Get authorities from both work and expression level (authors, translators, editors) | ||
| # We prioritize work-level authorities (authors) over expression-level (translators, editors) | ||
| work_authority_ids = InvolvedAuthority.where(item_id: manifestation.expression.work_id, item_type: 'Work') | ||
| .where(role: %i(author illustrator)) | ||
| .pluck(:authority_id) | ||
| expression_authority_ids = InvolvedAuthority.where(item_id: manifestation.expression_id, item_type: 'Expression') | ||
| .where(role: %i(translator editor)) | ||
| .pluck(:authority_id) |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code has N+1 query problems. For each collection_item iteration, two separate queries are made to InvolvedAuthority (lines 126-131). Consider refactoring to batch these queries outside the loop. For example, you could collect all work_ids and expression_ids first, then make two batch queries to InvolvedAuthority instead of querying per-item.
| # Get all authorities from collection items by examining manifestations | |
| authority_ids = [] | |
| collection.collection_items.where(item_type: 'Manifestation').includes(item: { expression: :work }).find_each do |ci| | |
| next if ci.item.blank? | |
| manifestation = ci.item | |
| # Get authorities from both work and expression level (authors, translators, editors) | |
| # We prioritize work-level authorities (authors) over expression-level (translators, editors) | |
| work_authority_ids = InvolvedAuthority.where(item_id: manifestation.expression.work_id, item_type: 'Work') | |
| .where(role: %i(author illustrator)) | |
| .pluck(:authority_id) | |
| expression_authority_ids = InvolvedAuthority.where(item_id: manifestation.expression_id, item_type: 'Expression') | |
| .where(role: %i(translator editor)) | |
| .pluck(:authority_id) | |
| # Collect all manifestations from collection items | |
| manifestations = [] | |
| collection.collection_items.where(item_type: 'Manifestation').includes(item: { expression: :work }).find_each do |ci| | |
| next if ci.item.blank? | |
| manifestations << ci.item | |
| end | |
| return nil if manifestations.empty? | |
| # Collect all relevant work and expression IDs for batched authority lookup | |
| work_ids = manifestations.map { |manifestation| manifestation.expression&.work_id }.compact.uniq | |
| expression_ids = manifestations.map { |manifestation| manifestation.expression_id }.compact.uniq | |
| # Batch-load authorities for works (authors, illustrators) | |
| work_authorities_by_work_id = if work_ids.any? | |
| InvolvedAuthority | |
| .where(item_type: 'Work', item_id: work_ids, role: %i[author illustrator]) | |
| .group_by(&:item_id) | |
| .transform_values { |records| records.map(&:authority_id) } | |
| else | |
| {} | |
| end | |
| # Batch-load authorities for expressions (translators, editors) | |
| expression_authorities_by_expression_id = if expression_ids.any? | |
| InvolvedAuthority | |
| .where(item_type: 'Expression', item_id: expression_ids, role: %i[translator editor]) | |
| .group_by(&:item_id) | |
| .transform_values { |records| records.map(&:authority_id) } | |
| else | |
| {} | |
| end | |
| # Get all authorities from collection items by examining manifestations | |
| authority_ids = [] | |
| manifestations.each do |manifestation| | |
| work_id = manifestation.expression&.work_id | |
| expression_id = manifestation.expression_id | |
| # Get authorities from both work and expression level (authors, translators, editors) | |
| # We prioritize work-level authorities (authors) over expression-level (translators, editors) | |
| work_authority_ids = work_id ? (work_authorities_by_work_id[work_id] || []) : [] | |
| expression_authority_ids = expression_id ? (expression_authorities_by_expression_id[expression_id] || []) : [] |
Summary
Fixes a race condition in
RefreshUncollectedWorksCollectionservice that created thousands of orphaned Collections (type='uncollected') not linked to any Authority.Root Cause
When
RefreshUncollectedWorksCollection.call(authority)was invoked concurrently for the same authority:This happened because:
Solution
1. Fixed the Service (app/services/refresh_uncollected_works_collection.rb)
Authority.transaction dowrapper for atomicityAuthority.lock.find(authority.id)Pattern followed: Similar to
collection_items_controller.rb:100(locking) andclean_up_simple_ahoy_events.rb(transaction)2. Added Concurrency Tests (spec/services/refresh_uncollected_works_collection_spec.rb)
Chewy.strategy(:bypass)to handle Elasticsearch in threads3. Created Cleanup Rake Task (lib/tasks/cleanup_orphaned_uncollected_collections.rake)
rake cleanup_orphaned_uncollected_collectionsrake cleanup_orphaned_uncollected_collections[execute]4. Added Rake Task Tests (spec/lib/tasks/cleanup_orphaned_uncollected_collections_rake_spec.rb)
Test Results
✅ Service tests: 4 examples, 0 failures
✅ Rake task tests: 13 examples, 0 failures
✅ Full test suite: 1668 examples, 0 failures, 14 pending (expected)
✅ RuboCop: All offenses fixed
✅ Test duration: 15 minutes 26 seconds
Files Changed
app/services/refresh_uncollected_works_collection.rb- Added transaction + lockingspec/services/refresh_uncollected_works_collection_spec.rb- Added concurrency testslib/tasks/cleanup_orphaned_uncollected_collections.rake- NEW cleanup taskspec/lib/tasks/cleanup_orphaned_uncollected_collections_rake_spec.rb- NEW task testsDeployment Notes
After merging:
rake cleanup_orphaned_uncollected_collections[execute]Related Issues
Addresses the investigation request about thousands of orphaned 'uncollected' Collections in production.
🤖 Generated with Claude Code