Skip to content

Conversation

@abartov
Copy link
Collaborator

@abartov abartov commented Jan 29, 2026

Summary

Fixes a race condition in RefreshUncollectedWorksCollection service that created thousands of orphaned Collections (type='uncollected') not linked to any Authority.

Root Cause

When RefreshUncollectedWorksCollection.call(authority) was invoked concurrently for the same authority:

  1. Thread A: Creates Collection, saves it to DB, hasn't linked yet
  2. Thread B: Creates Collection, saves it to DB, hasn't linked yet
  3. Thread A: Links its Collection to Authority (wins race)
  4. Thread B: Finds Authority already has a collection, skips linking
  5. Result: Thread B's Collection is orphaned!

This happened because:

  • Collection was saved to DB (line 39) before linking to Authority (lines 45-48)
  • No transaction wrapper or locking prevented concurrent execution
  • Service called from multiple places without synchronization

Solution

1. Fixed the Service (app/services/refresh_uncollected_works_collection.rb)

  • ✅ Added Authority.transaction do wrapper for atomicity
  • ✅ Implemented pessimistic locking: Authority.lock.find(authority.id)
  • ✅ Re-check for existing collection inside lock (prevents race)
  • ✅ Save collection and link to authority within same transaction
  • ✅ Removed redundant rubocop directives

Pattern followed: Similar to collection_items_controller.rb:100 (locking) and clean_up_simple_ahoy_events.rb (transaction)

2. Added Concurrency Tests (spec/services/refresh_uncollected_works_collection_spec.rb)

  • ✅ Test with 2 concurrent threads - verifies only 1 collection created
  • ✅ Test with 3 concurrent threads - verifies no lock contention errors
  • ✅ Uses Chewy.strategy(:bypass) to handle Elasticsearch in threads
  • ✅ Verifies no orphaned collections left after concurrent execution

3. Created Cleanup Rake Task (lib/tasks/cleanup_orphaned_uncollected_collections.rake)

  • Dry-run mode by default (safe): rake cleanup_orphaned_uncollected_collections
  • Execute mode: rake cleanup_orphaned_uncollected_collections[execute]
  • ✅ Identifies orphaned collections (type='uncollected', not linked to Authority)
  • ✅ Tries to link them by examining CollectionItems (checks authors via InvolvedAuthority)
  • ✅ Handles edge cases:
    • Links to correct Authority when identifiable
    • Deletes empty collections
    • Deletes duplicates when Authority already has a collection
    • Deletes unfixable collections with warning
  • ✅ Outputs detailed statistics

4. Added Rake Task Tests (spec/lib/tasks/cleanup_orphaned_uncollected_collections_rake_spec.rb)

  • ✅ Tests dry-run mode (no changes)
  • ✅ Tests all execute mode scenarios (13 examples)
  • ✅ Tests linking orphans to correct authority
  • ✅ Tests deleting empty/duplicate/unfixable orphans
  • ✅ Tests idempotency

Test Results

Service tests: 4 examples, 0 failures
Rake task tests: 13 examples, 0 failures
Full test suite: 1668 examples, 0 failures, 14 pending (expected)
RuboCop: All offenses fixed
Test duration: 15 minutes 26 seconds

Files Changed

  • app/services/refresh_uncollected_works_collection.rb - Added transaction + locking
  • spec/services/refresh_uncollected_works_collection_spec.rb - Added concurrency tests
  • lib/tasks/cleanup_orphaned_uncollected_collections.rake - NEW cleanup task
  • spec/lib/tasks/cleanup_orphaned_uncollected_collections_rake_spec.rb - NEW task tests

Deployment Notes

After merging:

  1. Run cleanup task in production: rake cleanup_orphaned_uncollected_collections[execute]
  2. Monitor output to see how many orphans were fixed/deleted
  3. No downtime required - changes are backward compatible

Related Issues

Addresses the investigation request about thousands of orphaned 'uncollected' Collections in production.

🤖 Generated with Claude Code

The RefreshUncollectedWorksCollection service had a race condition where
concurrent calls for the same authority could create orphaned Collections:
1. Both threads create Collection objects
2. Both save Collections to database
3. Only one links Collection to Authority
4. Result: Orphaned Collection not linked to any Authority

Fixed by:
- Adding Authority.transaction wrapper for atomicity
- Using pessimistic locking with Authority.lock.find()
- Re-checking for existing collection inside lock
- Saving collection and linking to authority within transaction

Also added:
- Concurrency tests using threads to verify race condition is fixed
- Cleanup rake task to fix existing orphaned collections:
  * Dry-run mode by default (safe)
  * Execute mode: rake cleanup_orphaned_uncollected_collections[execute]
  * Links orphans to correct Authority when identifiable
  * Deletes empty orphans and unfixable duplicates
- Comprehensive tests for rake task (13 examples)

All tests pass: 1668 examples, 0 failures

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Copilot AI review requested due to automatic review settings January 29, 2026 06:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes a race condition in the RefreshUncollectedWorksCollection service that was creating orphaned Collections. The fix introduces pessimistic locking and transactions, adds comprehensive concurrency tests, and provides a rake task to clean up existing orphaned collections.

Changes:

  • Added transaction wrapper and pessimistic locking to RefreshUncollectedWorksCollection service to prevent concurrent execution issues
  • Implemented concurrency tests with multiple threads to verify the race condition is resolved
  • Created a rake task with dry-run and execute modes to identify and clean up orphaned uncollected collections
  • Added comprehensive test coverage for the rake task covering all cleanup scenarios

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
app/services/refresh_uncollected_works_collection.rb Added transaction and pessimistic locking to prevent race conditions when creating/updating uncollected works collections
spec/services/refresh_uncollected_works_collection_spec.rb Added concurrency tests with 2 and 3 threads to verify only one collection is created and no orphans remain
lib/tasks/cleanup_orphaned_uncollected_collections.rake New rake task to identify and fix/delete orphaned collections, with dry-run and execute modes
spec/lib/tasks/cleanup_orphaned_uncollected_collections_rake_spec.rb Comprehensive tests for the rake task covering all scenarios: linking, deleting empty/duplicate/unfixable collections

# Disable automatic manifestations_count updates during bulk add
collection.skip_manifestations_count_update = true

nextseqno = (collection.collection_items.maximum(:seqno) || 0) + 1
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When creating a new collection (is_new_collection == true), calling collection.collection_items.maximum(:seqno) on line 31 may cause issues because the collection hasn't been saved to the database yet. For new collections, this should return 0 directly without querying. Consider checking is_new_collection or collection.new_record? before calling .maximum(:seqno) to avoid potential database errors.

Suggested change
nextseqno = (collection.collection_items.maximum(:seqno) || 0) + 1
nextseqno =
if is_new_collection || collection.new_record?
1
else
(collection.collection_items.maximum(:seqno) || 0) + 1
end

Copilot uses AI. Check for mistakes.
Comment on lines +55 to +56
elsif locked_authority.changed?
locked_authority.save!
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The elsif locked_authority.changed? check on line 55 may be unnecessary. If is_new_collection is false, the collection already exists and is linked to the authority. The only potential change to locked_authority would be if remove_collected_works modifies it, but that method only modifies the collection's items, not the authority itself. Consider removing this condition or adding a comment explaining when the authority would be changed in the else branch.

Suggested change
elsif locked_authority.changed?
locked_authority.save!

Copilot uses AI. Check for mistakes.
Comment on lines +55 to +66
elsif potential_authority && potential_authority.uncollected_works_collection.present?
# Authority already has a collection
collection_id = potential_authority.uncollected_works_collection_id
puts " ⚠ Authority #{potential_authority.id} already has an uncollected collection (ID: #{collection_id})"
puts ' → Will delete this orphaned duplicate'
if execute_mode
collection.destroy!
stats[:deleted_unfixable] += 1
puts ' ✓ Deleted duplicate collection'
else
puts ' [DRY-RUN] Would delete duplicate collection'
end
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate collections (lines 55-66) are being counted as 'deleted_unfixable' rather than having their own counter like 'deleted_duplicate'. This makes the statistics less accurate and could be confusing in the summary output. Consider adding a separate counter for deleted duplicates to make the reporting clearer and more accurate.

Copilot uses AI. Check for mistakes.
Comment on lines +126 to +128
work_authority_ids = InvolvedAuthority.where(item_id: manifestation.expression.work_id, item_type: 'Work')
.where(role: %i(author illustrator))
.pluck(:authority_id)
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code assumes manifestation.expression and manifestation.expression.work are always present (lines 126, 129). If a manifestation doesn't have an expression or if the expression doesn't have a work, this will raise a NoMethodError. Consider adding nil checks or using safe navigation operator (&.) before accessing work_id and expression_id.

Suggested change
work_authority_ids = InvolvedAuthority.where(item_id: manifestation.expression.work_id, item_type: 'Work')
.where(role: %i(author illustrator))
.pluck(:authority_id)
expression = manifestation.expression
work_id = expression&.work_id
work_authority_ids = if work_id.present?
InvolvedAuthority.where(item_id: work_id, item_type: 'Work')
.where(role: %i(author illustrator))
.pluck(:authority_id)
else
[]
end

Copilot uses AI. Check for mistakes.
Comment on lines +116 to +131
# Get all authorities from collection items by examining manifestations
authority_ids = []

collection.collection_items.where(item_type: 'Manifestation').includes(item: { expression: :work }).find_each do |ci|
next if ci.item.blank?

manifestation = ci.item

# Get authorities from both work and expression level (authors, translators, editors)
# We prioritize work-level authorities (authors) over expression-level (translators, editors)
work_authority_ids = InvolvedAuthority.where(item_id: manifestation.expression.work_id, item_type: 'Work')
.where(role: %i(author illustrator))
.pluck(:authority_id)
expression_authority_ids = InvolvedAuthority.where(item_id: manifestation.expression_id, item_type: 'Expression')
.where(role: %i(translator editor))
.pluck(:authority_id)
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code has N+1 query problems. For each collection_item iteration, two separate queries are made to InvolvedAuthority (lines 126-131). Consider refactoring to batch these queries outside the loop. For example, you could collect all work_ids and expression_ids first, then make two batch queries to InvolvedAuthority instead of querying per-item.

Suggested change
# Get all authorities from collection items by examining manifestations
authority_ids = []
collection.collection_items.where(item_type: 'Manifestation').includes(item: { expression: :work }).find_each do |ci|
next if ci.item.blank?
manifestation = ci.item
# Get authorities from both work and expression level (authors, translators, editors)
# We prioritize work-level authorities (authors) over expression-level (translators, editors)
work_authority_ids = InvolvedAuthority.where(item_id: manifestation.expression.work_id, item_type: 'Work')
.where(role: %i(author illustrator))
.pluck(:authority_id)
expression_authority_ids = InvolvedAuthority.where(item_id: manifestation.expression_id, item_type: 'Expression')
.where(role: %i(translator editor))
.pluck(:authority_id)
# Collect all manifestations from collection items
manifestations = []
collection.collection_items.where(item_type: 'Manifestation').includes(item: { expression: :work }).find_each do |ci|
next if ci.item.blank?
manifestations << ci.item
end
return nil if manifestations.empty?
# Collect all relevant work and expression IDs for batched authority lookup
work_ids = manifestations.map { |manifestation| manifestation.expression&.work_id }.compact.uniq
expression_ids = manifestations.map { |manifestation| manifestation.expression_id }.compact.uniq
# Batch-load authorities for works (authors, illustrators)
work_authorities_by_work_id = if work_ids.any?
InvolvedAuthority
.where(item_type: 'Work', item_id: work_ids, role: %i[author illustrator])
.group_by(&:item_id)
.transform_values { |records| records.map(&:authority_id) }
else
{}
end
# Batch-load authorities for expressions (translators, editors)
expression_authorities_by_expression_id = if expression_ids.any?
InvolvedAuthority
.where(item_type: 'Expression', item_id: expression_ids, role: %i[translator editor])
.group_by(&:item_id)
.transform_values { |records| records.map(&:authority_id) }
else
{}
end
# Get all authorities from collection items by examining manifestations
authority_ids = []
manifestations.each do |manifestation|
work_id = manifestation.expression&.work_id
expression_id = manifestation.expression_id
# Get authorities from both work and expression level (authors, translators, editors)
# We prioritize work-level authorities (authors) over expression-level (translators, editors)
work_authority_ids = work_id ? (work_authorities_by_work_id[work_id] || []) : []
expression_authority_ids = expression_id ? (expression_authorities_by_expression_id[expression_id] || []) : []

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants