-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
ref(grouping): Clarify Seer grouping deletion code #95152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/sentry/options/defaults.py
Outdated
@@ -1097,13 +1097,6 @@ | |||
flags=FLAG_ALLOW_EMPTY | FLAG_AUTOMATOR_MODIFIABLE, | |||
) | |||
|
|||
register( | |||
"embeddings-grouping.seer.delete-record-batch-size", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're going to use a constant instead of this.
@@ -16,15 +16,15 @@ | |||
from sentry.taskworker.namespaces import seer_tasks | |||
from sentry.utils.query import RangeQuerySetWrapper | |||
|
|||
BATCH_SIZE = 1000 | |||
BATCH_SIZE = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The option is set to 100, thus, let's go with that.
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
@instrumented_task( | ||
name="sentry.tasks.delete_seer_grouping_records_by_hash", | ||
queue="delete_seer_grouping_records_by_hash", | ||
max_retries=0, | ||
max_retries=0, # XXX: Why do we not retry? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit shocked we don't retry. Any thoughts as to why we shouldn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add a retry handler. Make sure you allow list one or more exception types though. Without that retries don't happen unless a TimeoutError
is raised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have some good examples I can look at?
Codecov ReportAttention: Patch coverage is ✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## master #95152 +/- ##
===========================================
- Coverage 87.89% 74.72% -13.17%
===========================================
Files 10458 10457 -1
Lines 604829 604842 +13
Branches 23607 23607
===========================================
- Hits 531616 451982 -79634
- Misses 72852 152499 +79647
Partials 361 361 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
@instrumented_task( | ||
name="sentry.tasks.delete_seer_grouping_records_by_hash", | ||
queue="delete_seer_grouping_records_by_hash", | ||
max_retries=0, | ||
max_retries=0, # XXX: Why do we not retry? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add a retry handler. Make sure you allow list one or more exception types though. Without that retries don't happen unless a TimeoutError
is raised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Test Fails When Batch Size Changes
The test_delete_seer_grouping_records_by_hash_batches
assertion hardcodes the expected task end_index
as 100. This value should instead use the dynamically calculated batch_size
variable, which is derived from the embeddings-grouping.seer.delete-record-batch-size
option. If this option's value differs from 100, the test will incorrectly fail.
tests/sentry/tasks/test_delete_seer_grouping_records.py#L28-L36
sentry/tests/sentry/tasks/test_delete_seer_grouping_records.py
Lines 28 to 36 in b95cc57
""" | |
batch_size = options.get("embeddings-grouping.seer.delete-record-batch-size") or 100 | |
mock_call_seer_to_delete_these_hashes.return_value = True | |
project_id, hashes = 1, [str(i) for i in range(batch_size + 1)] | |
# We call it as a function and will schedule a task for the extra hash | |
delete_seer_grouping_records_by_hash(project_id, hashes, 0) | |
assert mock_delete_seer_grouping_records_by_hash_apply_async.call_args[1] == { | |
"args": [project_id, hashes, 100] | |
} |
Was this report helpful? Give feedback by reacting with 👍 or 👎
This will make my next pull request easier to review. Ref [inc-1236](https://sentry.slack.com/archives/C095HFVE80Y/p1752007736483209)
This will make my next pull request easier to review.
Ref inc-1236