Skip to content

ref(grouping): Clarify Seer grouping deletion code #95152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 10, 2025

Conversation

armenzg
Copy link
Member

@armenzg armenzg commented Jul 9, 2025

This will make my next pull request easier to review.

Ref inc-1236

@armenzg armenzg self-assigned this Jul 9, 2025
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jul 9, 2025
@@ -1097,13 +1097,6 @@
flags=FLAG_ALLOW_EMPTY | FLAG_AUTOMATOR_MODIFIABLE,
)

register(
"embeddings-grouping.seer.delete-record-batch-size",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to use a constant instead of this.

@@ -16,15 +16,15 @@
from sentry.taskworker.namespaces import seer_tasks
from sentry.utils.query import RangeQuerySetWrapper

BATCH_SIZE = 1000
BATCH_SIZE = 100
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The option is set to 100, thus, let's go with that.


logger = logging.getLogger(__name__)


@instrumented_task(
name="sentry.tasks.delete_seer_grouping_records_by_hash",
queue="delete_seer_grouping_records_by_hash",
max_retries=0,
max_retries=0, # XXX: Why do we not retry?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit shocked we don't retry. Any thoughts as to why we shouldn't?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add a retry handler. Make sure you allow list one or more exception types though. Without that retries don't happen unless a TimeoutError is raised.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have some good examples I can look at?

@armenzg armenzg marked this pull request as ready for review July 9, 2025 18:45
@armenzg armenzg requested review from a team as code owners July 9, 2025 18:45
cursor[bot]

This comment was marked as outdated.

Copy link

codecov bot commented Jul 9, 2025

Codecov Report

Attention: Patch coverage is 61.11111% with 7 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
.../sentry/tasks/test_delete_seer_grouping_records.py 57.14% 3 Missing ⚠️
src/sentry/tasks/delete_seer_grouping_records.py 50.00% 2 Missing ⚠️
src/sentry/seer/similarity/grouping_records.py 66.66% 1 Missing ⚠️
...ts/sentry/seer/similarity/test_grouping_records.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #95152       +/-   ##
===========================================
- Coverage   87.89%   74.72%   -13.17%     
===========================================
  Files       10458    10457        -1     
  Lines      604829   604842       +13     
  Branches    23607    23607               
===========================================
- Hits       531616   451982    -79634     
- Misses      72852   152499    +79647     
  Partials      361      361               

Copy link
Member

@markstory markstory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.


logger = logging.getLogger(__name__)


@instrumented_task(
name="sentry.tasks.delete_seer_grouping_records_by_hash",
queue="delete_seer_grouping_records_by_hash",
max_retries=0,
max_retries=0, # XXX: Why do we not retry?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add a retry handler. Make sure you allow list one or more exception types though. Without that retries don't happen unless a TimeoutError is raised.

Copy link
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Test Fails When Batch Size Changes

The test_delete_seer_grouping_records_by_hash_batches assertion hardcodes the expected task end_index as 100. This value should instead use the dynamically calculated batch_size variable, which is derived from the embeddings-grouping.seer.delete-record-batch-size option. If this option's value differs from 100, the test will incorrectly fail.

tests/sentry/tasks/test_delete_seer_grouping_records.py#L28-L36

"""
batch_size = options.get("embeddings-grouping.seer.delete-record-batch-size") or 100
mock_call_seer_to_delete_these_hashes.return_value = True
project_id, hashes = 1, [str(i) for i in range(batch_size + 1)]
# We call it as a function and will schedule a task for the extra hash
delete_seer_grouping_records_by_hash(project_id, hashes, 0)
assert mock_delete_seer_grouping_records_by_hash_apply_async.call_args[1] == {
"args": [project_id, hashes, 100]
}

Fix in CursorFix in Web


Was this report helpful? Give feedback by reacting with 👍 or 👎

@armenzg armenzg merged commit 05ca85a into master Jul 10, 2025
65 checks passed
@armenzg armenzg deleted the ref/seer_deletion/armenzg branch July 10, 2025 12:41
andrewshie-sentry pushed a commit that referenced this pull request Jul 14, 2025
@github-actions github-actions bot locked and limited conversation to collaborators Jul 25, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants