fix(grouping): Schedule seer deletion tasks with less hashes #95156

armenzg · 2025-07-09T18:53:26Z

The original code would always pass all hashes to all tasks spawned, thus, we could end up with massive payloads for tasks causing trouble to taskbroker.

We got into such a situation in the last few days when the deletion of a project would lead to hundreds of thousands of hashes being passed to tasks (179k+ hashes -> 6MB+ task payloads).

The changes here would take all hashes from a task, chunk the hashes and spawn new tasks with a small size of hashes.

This moves us from sequential scheduling of tasks to parallelized scheduling.
This could have an impact on the Seer service.

Ref inc-1236

armenzg · 2025-07-09T18:54:35Z

src/sentry/tasks/delete_seer_grouping_records.py

+        for i in range(last_deleted_index, len_hashes, BATCH_SIZE):
+            # Slice operations are safe and will not raise IndexError
+            chunked_hashes = hashes[i : i + BATCH_SIZE]
+            delete_seer_grouping_records_by_hash.apply_async(args=[project_id, chunked_hashes, 0])


Newer tasks will be scheduled with last_deleted_index=0 since we're scheduling a chunked task.

JoshFerge · 2025-07-09T18:59:42Z

src/sentry/tasks/delete_seer_grouping_records.py

+        # Iterate through hashes in chunks and schedule a task for each chunk
+        # There are tasks passing last_deleted_index, thus, we need to start from that index
+        # Eventually all tasks will pass 0
+        for i in range(last_deleted_index, len_hashes, BATCH_SIZE):


do we want to add similar tests to make sure the right number of tasks get called?

I think this test I added yesterday covers it.

sentry/tests/sentry/tasks/test_delete_seer_grouping_records.py

Line 61 in f28737c

def test_call_delete_seer_grouping_records_by_hash_chunked(self) -> None:

codecov · 2025-07-09T19:10:07Z

Codecov Report

Attention: Patch coverage is 75.00000% with 2 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/sentry/tasks/delete_seer_grouping_records.py	75.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #95156      +/-   ##
==========================================
+ Coverage   87.84%   87.90%   +0.05%     
==========================================
  Files       10469    10459      -10     
  Lines      605374   604694     -680     
  Branches    23674    23571     -103     
==========================================
- Hits       531819   531575     -244     
+ Misses      73195    72758     -437     
- Partials      360      361       +1

markstory

Makes sense to me.

markstory · 2025-07-09T19:30:21Z

src/sentry/tasks/delete_seer_grouping_records.py

-    end_index = min(last_deleted_index + BATCH_SIZE, len_hashes)
-    call_seer_to_delete_these_hashes(project_id, hashes[last_deleted_index:end_index])
-    if end_index < len_hashes:
-        delete_seer_grouping_records_by_hash.apply_async(args=[project_id, hashes, end_index])


Is this where all the continued stream of big tasks was coming from?

@markstory yes, this is where it came from

armenzg · 2025-07-10T12:43:38Z

bugbot run

cursor

✅ BugBot reviewed your changes and found no bugs!

Was this report helpful? Give feedback by reacting with 👍 or 👎

This simplifies the tests for deletions of hashes from Seer and it also adds a test for #95156.

The original code would always pass all hashes to all tasks spawned, thus, we could end up with massive payloads for tasks causing trouble to taskbroker. We got into such a situation in the last few days when the deletion of a project would lead to hundreds of thousands of hashes being passed to tasks (179k+ hashes -> 6MB+ task payloads). The changes here would take all hashes from a task, chunk the hashes and spawn new tasks with a small size of hashes. This moves us from sequential scheduling of tasks to parallelized scheduling. This could have an impact on the Seer service if a massive number of hashes are requested for deletion. Ref inc-1236

This simplifies the tests for deletions of hashes from Seer and it also adds a test for #95156.

sentry-io · 2025-07-19T12:30:58Z

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

‼️ MaxRetryError: HTTPConnectionPool(host='seer-gpu-w... in prod

_{Did you find this useful? React with a 👍 or 👎}

ref(grouping): Clarify Seer grouping deletion code

7b96409

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jul 9, 2025

armenzg commented Jul 9, 2025

View reviewed changes

armenzg marked this pull request as ready for review July 9, 2025 18:55

armenzg requested a review from a team as a code owner July 9, 2025 18:55

vercel bot deployed to Preview July 9, 2025 18:59 View deployment

JoshFerge reviewed Jul 9, 2025

View reviewed changes

JoshFerge approved these changes Jul 9, 2025

View reviewed changes

armenzg requested a review from markstory July 9, 2025 19:15

markstory approved these changes Jul 9, 2025

View reviewed changes

armenzg added 3 commits July 10, 2025 08:17

Add batch size option back

c7f3698

Fix test

b95cc57

fix(grouping): Schedule seer deletion tasks with less hashes

97e7fba

armenzg force-pushed the fix/split_seer_deletions/armenzg branch from 0161fd9 to 97e7fba Compare July 10, 2025 12:29

vercel bot deployed to Preview July 10, 2025 12:31 View deployment

BATCH_SIZE -> batch_size

9963e79

vercel bot deployed to Preview July 10, 2025 12:33 View deployment

Base automatically changed from ref/seer_deletion/armenzg to master July 10, 2025 12:41

armenzg requested a review from a team as a code owner July 10, 2025 12:41

Merge branch 'master' into fix/split_seer_deletions/armenzg

8af186d

vercel bot deployed to Preview July 10, 2025 12:44 View deployment

cursor bot reviewed Jul 10, 2025

View reviewed changes

armenzg merged commit 130345e into master Jul 10, 2025
66 checks passed

armenzg deleted the fix/split_seer_deletions/armenzg branch July 10, 2025 14:18

armenzg mentioned this pull request Jul 10, 2025

ref(deletions): Simplify seer deletion tests #95267

Merged

armenzg added a commit that referenced this pull request Jul 11, 2025

ref(deletions): Simplify seer deletion tests (#95267)

3b1bb7f

This simplifies the tests for deletions of hashes from Seer and it also adds a test for #95156.

andrewshie-sentry pushed a commit that referenced this pull request Jul 14, 2025

ref(deletions): Simplify seer deletion tests (#95267)

5d093a4

This simplifies the tests for deletions of hashes from Seer and it also adds a test for #95156.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix(grouping): Schedule seer deletion tasks with less hashes #95156

fix(grouping): Schedule seer deletion tasks with less hashes #95156

Uh oh!

armenzg commented Jul 9, 2025 •

edited

Loading

Uh oh!

armenzg Jul 9, 2025

Uh oh!

JoshFerge Jul 9, 2025

Uh oh!

armenzg Jul 9, 2025

Uh oh!

codecov bot commented Jul 9, 2025 •

edited

Loading

Uh oh!

markstory left a comment

Uh oh!

markstory Jul 9, 2025

Uh oh!

armenzg Jul 10, 2025

Uh oh!

armenzg commented Jul 10, 2025

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

sentry-io bot commented Jul 19, 2025

Uh oh!

Uh oh!

Uh oh!

fix(grouping): Schedule seer deletion tasks with less hashes #95156

fix(grouping): Schedule seer deletion tasks with less hashes #95156

Uh oh!

Conversation

armenzg commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

armenzg Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

JoshFerge Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

armenzg Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

markstory left a comment

Choose a reason for hiding this comment

Uh oh!

markstory Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

armenzg Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

armenzg commented Jul 10, 2025

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

✅ BugBot reviewed your changes and found no bugs!

Uh oh!

Uh oh!

sentry-io bot commented Jul 19, 2025

Suspect Issues

Uh oh!

Uh oh!

armenzg commented Jul 9, 2025 •

edited

Loading

codecov bot commented Jul 9, 2025 •

edited

Loading