fix(grouping): Handle race condition creating grouphash metadata #85917
+21
−20
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a second attempt to fix the errors we're seeing when enabling backfill for grouphash metadata, wherein a small percentage of new grouphashes can't have their Seer metadata updated because the metadata record "hasn't been created yet."
We noticed that the metadata records to which this happened all seemed to have an id of
None
, so the first attempt at a fix was to callsave
before doing the update in those cases, to make sure the record was in the database. This introduced its own problems, however, in that thesave
call then started throwing an integrity error, complaining that a metadata record for that grouphash already existed. Since the call to Seer only happens with new, unrecognized grouphashes, and since we have a lock to prevent double group creation in those cases, this didn't make a ton of sense... until I realized that the lock doesn’t kick in until after we're done calling Seer. Whoops.As a result, here's what I think is currently happening when the backfill is off:
GroupHash.objects.get_or_create
call inside ofget_or_create_grouphashes
. (We useget_or_create
there because the same codepath serves both new and existing hashes.) The slightly-sooner event creates a new grouphash and the slightly-later event gets that newly-created grouphash from the database. Each event has its ownGroupHash
instance to represent the shared database record.create_or_update_grouphash_metadata_if_needed
.GroupHashMetadata
object to that event'sGroupHash
instance.GroupHash
instance hasn't had a metadata record attached to it.save_event
process proceeds as it normally would.And here's what I think is happening when the backfill is on:
create_or_update_grouphash_metadata_if_needed
. Withincreate_or_update_grouphash_metadata_if_needed
, they both fall into theif not grouphash.metadata
branch, since neither one has yet had a chance to create a metadata record.4a. The slightly-later event also tries to create metadata for the grouphash, but can't, because the slightly-sooner event already has. Django still attaches a
GroupHashMetadata
object to the event'sGroupHash
instance, but an id never gets filled in because no metadata record was successfully created.With that scenario in mind, this PR makes three changes:
create_or_update_grouphash_metadata_if_needed
, switch from usingGroupHashMetadata.objects.create
to usingGroupHashMetadata.objects.get_or_create
. Do this with just the grouphash filled in, because we know it will be the same between the events.get_or_create
indicates that it got an already-existent record, log some info and then bail rather than trying to update the record with more data. That way, only the slightly-sooner event will have a metadata instance attached to its grouphash, and the Seer codepath will behave the way it did before backfill was turned on.If this works, we can then remove the band-aid from the Seer code.