Implement fully parallel upload processing #658

Swatinem · 2024-08-29T13:36:40Z

This adds another variant to the PARALLEL_PROCESSING feature/rollout flag which prefers the parallel upload processing pipeline in favor of running it as an experiment.

Upload Processing can run in essentially 4 modes:

Completely serial processing
Serial processing, but running "experiment" code (EXPERIMENT_SERIAL):
- In this mode, the final (is_final) UploadProcessor task saves a copy
  of the final report for later verification.
Parallel processing, but running "experiment" code (EXPERIMENT_PARALLEL):
- In this mode, another parallel set of UploadProcessor tasks runs after
  the main set up tasks.
- These tasks are not persisting any of their results in the database,
  instead the final UploadFinisher task will launch the ParallelVerification task.
Fully parallel processing (PARALLEL):
- In this mode, the final UploadFinisher task is responsible for merging
  the final report and persisting it.

An example Task chain might look like this, in "experiment" mode:

Upload
- UploadProcessor
  - UploadProcessor
    - UploadProcessor (EXPERIMENT_SERIAL (the final one))
      - UploadFinisher
        
        UploadProcessor (EXPERIMENT_PARALLEL)
        
        UploadProcessor (EXPERIMENT_PARALLEL)
        
        UploadProcessor (EXPERIMENT_PARALLEL)
        
        UploadFinisher (EXPERIMENT_PARALLEL)
        
        ParallelVerification

The PARALLEL mode looks like this:

Upload
- UploadProcessor (PARALLEL)
- UploadProcessor (PARALLEL)
- UploadProcessor (PARALLEL)
  - UploadFinisher (PARALLEL)

codecov-notifications · 2024-08-29T13:42:48Z

Codecov Report

Attention: Patch coverage is 99.21260% with 1 line in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
tasks/upload.py	94.73%	1 Missing ⚠️

@@           Coverage Diff           @@
##             main     #658   +/-   ##
=======================================
  Coverage   98.02%   98.02%           
=======================================
  Files         437      438    +1     
  Lines       36313    36389   +76     
=======================================
+ Hits        35597    35672   +75     
- Misses        716      717    +1

Flag	Coverage Δ
integration	`98.02% <99.21%> (+<0.01%)`	⬆️
unit	`98.02% <99.21%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
NonTestCode	`95.89% <99.18%> (+0.01%)`	⬆️
OutsideTasks	`98.03% <100.00%> (+<0.01%)`	⬆️

Files with missing lines	Coverage Δ
helpers/parallel.py	`100.00% <100.00%> (ø)`
services/report/__init__.py	`97.20% <100.00%> (+0.02%)`	⬆️
tasks/tests/integration/test_upload_e2e.py	`100.00% <100.00%> (ø)`
tasks/tests/unit/test_upload_processing_task.py	`100.00% <100.00%> (ø)`
tasks/upload_finisher.py	`96.00% <100.00%> (+0.40%)`	⬆️
tasks/upload_processor.py	`99.38% <100.00%> (ø)`
tasks/upload.py	`96.10% <94.73%> (-0.18%)`	⬇️

codecov-qa · 2024-08-29T13:42:54Z

Codecov Report

Attention: Patch coverage is 99.21260% with 1 line in your changes missing coverage. Please review.

Project coverage is 98.02%. Comparing base (c97132d) to head (3ef30d7).
Report is 1 commits behind head on main.

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
tasks/upload.py	94.73%	1 Missing ⚠️

@@           Coverage Diff           @@
##             main     #658   +/-   ##
=======================================
  Coverage   98.02%   98.02%           
=======================================
  Files         437      438    +1     
  Lines       36313    36389   +76     
=======================================
+ Hits        35597    35672   +75     
- Misses        716      717    +1

Flag	Coverage Δ
integration	`98.02% <99.21%> (+<0.01%)`	⬆️
unit	`98.02% <99.21%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
NonTestCode	`95.89% <99.18%> (+0.01%)`	⬆️
OutsideTasks	`98.03% <100.00%> (+<0.01%)`	⬆️

Files with missing lines	Coverage Δ
helpers/parallel.py	`100.00% <100.00%> (ø)`
services/report/__init__.py	`97.20% <100.00%> (+0.02%)`	⬆️
tasks/tests/integration/test_upload_e2e.py	`100.00% <100.00%> (ø)`
tasks/tests/unit/test_upload_processing_task.py	`100.00% <100.00%> (ø)`
tasks/upload_finisher.py	`96.00% <100.00%> (+0.40%)`	⬆️
tasks/upload_processor.py	`99.38% <100.00%> (ø)`
tasks/upload.py	`96.10% <94.73%> (-0.18%)`	⬇️

codecov-public-qa · 2024-08-29T13:43:07Z

Codecov Report

Attention: Patch coverage is 99.21260% with 1 line in your changes missing coverage. Please review.

Project coverage is 98.02%. Comparing base (c97132d) to head (3ef30d7).

✅ All tests successful. No failed tests found.

@@           Coverage Diff           @@
##             main     #658   +/-   ##
=======================================
  Coverage   98.02%   98.02%           
=======================================
  Files         437      438    +1     
  Lines       36313    36389   +76     
=======================================
+ Hits        35597    35672   +75     
- Misses        716      717    +1

Flag	Coverage Δ
integration	`98.02% <99.21%> (+<0.01%)`	⬆️
unit	`98.02% <99.21%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
NonTestCode	`95.89% <99.18%> (+0.01%)`	⬆️
OutsideTasks	`98.03% <100.00%> (+<0.01%)`	⬆️

Files	Coverage Δ
helpers/parallel.py	`100.00% <100.00%> (ø)`
services/report/__init__.py	`97.20% <100.00%> (+0.02%)`	⬆️
tasks/tests/integration/test_upload_e2e.py	`100.00% <100.00%> (ø)`
tasks/tests/unit/test_upload_processing_task.py	`100.00% <100.00%> (ø)`
tasks/upload_finisher.py	`96.00% <100.00%> (+0.40%)`	⬆️
tasks/upload_processor.py	`99.38% <100.00%> (ø)`
tasks/upload.py	`96.10% <94.73%> (-0.18%)`	⬇️

codecov · 2024-08-29T13:43:24Z

Codecov Report

Attention: Patch coverage is 99.21260% with 1 line in your changes missing coverage. Please review.

Project coverage is 98.02%. Comparing base (c97132d) to head (3ef30d7).
Report is 1 commits behind head on main.

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
tasks/upload.py	94.73%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #658   +/-   ##
=======================================
  Coverage   98.02%   98.02%           
=======================================
  Files         437      438    +1     
  Lines       36313    36389   +76     
=======================================
+ Hits        35597    35672   +75     
- Misses        716      717    +1

Flag	Coverage Δ
integration	`98.02% <99.21%> (+<0.01%)`	⬆️
unit	`98.02% <99.21%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
NonTestCode	`95.89% <99.18%> (+0.01%)`	⬆️
OutsideTasks	`98.03% <100.00%> (+<0.01%)`	⬆️

Files with missing lines	Coverage Δ
helpers/parallel.py	`100.00% <100.00%> (ø)`
services/report/__init__.py	`97.20% <100.00%> (+0.02%)`	⬆️
tasks/tests/integration/test_upload_e2e.py	`100.00% <100.00%> (ø)`
tasks/tests/unit/test_upload_processing_task.py	`100.00% <100.00%> (ø)`
tasks/upload_finisher.py	`96.00% <100.00%> (+0.40%)`	⬆️
tasks/upload_processor.py	`99.38% <100.00%> (ø)`
tasks/upload.py	`96.10% <94.73%> (-0.18%)`	⬇️

michelletran-codecov

Just a few comments

tasks/upload_processor.py

helpers/parallel.py

tasks/upload.py

helpers/parallel.py

matt-codecov

i think this is somewhat cleaner than the existing harness but to be honest i still feel a little uncomfortable with all the if/else branches peppered around to copy something here and skip writing something there. it feels too easy to accidentally break real processing or leave side-effects that real users will be able to see

the approach i imagine would be simpler would be a separate task that either runs nightly and chooses a batch of N commits, or is scheduled as a followup after X% of finisher tasks. this task would fetch completed report JSONs and use the sessions list from them to reconstruct UploadTask arguments but with dummy commits/repos owned by Codecov plugged in. one dummy repo would be overridden into the expt and the other overridden out of it. we run the identical task arguments for each repo and compare the results

with that approach, any and all copying/staging we need to do for verification can happen in one place, and there's little to no risk of our test procedure accidentally breaking things for production users or accidentally leaving side-effects that they can see. there's nothing to clean up when transitioning from validation to running the actual experiment, it's just a Feature with a test and control group. it doesn't faithfully reproduce carryforward inheritance, but CFF is all settled before anything changes for parallel processing anyway. i think the main downside is having to suppress GitHub API errors because our dummy repos probably won't have unique authentic commits/PRs for each batch of tasks we want to test

out of steam for the day but will see your thoughts tomorrow

helpers/parallel.py

services/report/__init__.py

matt-codecov · 2024-09-12T01:19:07Z

services/report/__init__.py

-            # this should be enabled for the actual rollout of parallel upload processing.
-            # if PARALLEL_UPLOAD_PROCESSING_BY_REPO.check_value(
-            #     "this should be the repo id"
-            # ):
-            #     upload_obj.state_id = UploadState.PARALLEL_PROCESSED.db_id
-            #     upload_obj.state = "parallel_processed"
-            # else:


if you haven't found it, this enum value is what this commented out block is about

a state of PROCESSED implies the upload's data will be found if you call get_existing_report_for_commit(). a state of PARALLEL_PROCESSED indicates UploadProcessorTask has finished but UploadFinisherTask has not gotten to it yet. don't remember if the distinction mattered

fully forgot about this bit

Swatinem · 2024-09-12T08:32:08Z

to be honest i still feel a little uncomfortable with all the if/else branches peppered around to copy something here and skip writing something there.

I totally agree with this. I’m tempted to just create a new task for parallel processing which removes all the code related to handling multiple uploads in one chunk, and have ideas for further simplification ahead of time.

matt-codecov · 2024-09-12T22:30:17Z

i think some of the brittleness is inherent to the "kick off parallel tasks but copy all the inputs and then skip saving the outputs" approach to verification, but i'd be happy to be proved wrong haha. my suggested alternative requires us to handle any GH request failure non-fatally which may be easier said than done

there's a lot in upload_processor.py that could be reused, but you're right that we'll have to clean up the multi-upload batch stuff sooner or later and it's easier to reason about the parallel implementation if we do it sooner

i should have said this in my initial comment but: i can't see any specific problems in the PR apart from the edge case with IDs which only matters for comparison with serial results, and that was already there. i think this is all logically correct, and less fragile than it was before. i am excited to see this PR and for this project to get some momentum

Swatinem · 2024-10-01T12:31:30Z

I updated this PR yet again, with the following changes:

Switched the PARALLEL_UPLOAD_PROCESSING_BY_REPO option to a tri-state flag
Introduced two enums, one wrapping that feature flag, the other managing the 4 states that the various tasks can be in
Otherwise I kept the logic mostly as is, which also means that there are still tons of ifs scattered all around.

To be quite honest, I think just keeping the various ifs littered around is preferable to duplicating all this logic.
Once parallel is fully enabled, there should be a lot of stuff ready to be cleaned up.

One thing that I would still have to take care of is the migration path. Rolling out the feature flag currently has a direct effect on already scheduled tasks, which should be avoided.

michelletran-codecov

Generally LGTM. I think I would also wait for @matt-codecov 's review/approval as he has more context into this code.

tasks/upload.py

matt-codecov · 2024-10-02T22:33:20Z

tasks/upload.py

+        if parallel_feature is ParallelFeature.EXPERIMENT and delete_archive_setting(
+            commit_yaml
+        ):
+            parallel_feature = ParallelFeature.SERIAL


do you happen to know why this setting should disable the experiment? is it a problem for the fully parallel mode?

I changed the relevant code to avoid creating a copy of the upload, in favor of just using the upload as it exists, provided it does exist and is not being deleted :-)

Parallel processing does not have that problem, as it only has a single task processing (and deleting) a raw upload.

matt-codecov · 2024-10-02T22:42:05Z

tasks/upload_finisher.py

+                # When we are fully parallel, we need to update the `Upload` in the database
+                # with the final session_id (aka `order_number`).


is this taking the place of the PARALLEL_PROCESSED upload state daniel had?

not quite. I will discuss these various states a bit more and figure out a good way to go there.

But good that you called this out, I found another bug related to the new code from #745 not being ported to this PR yet, which I now did.

This adds another variant to the `PARALLEL_PROCESSING` feature/rollout flag which prefers the parallel upload processing pipeline in favor of running it as an experiment. Upload Processing can run in essentially 4 modes: - Completely serial processing - Serial processing, but running "experiment" code (`EXPERIMENT_SERIAL`): - In this mode, the final (`is_final`) `UploadProcessor` task saves a copy of the final report for later verification. - Parallel processing, but running "experiment" code (`EXPERIMENT_PARALLEL`): - In this mode, another parallel set of `UploadProcessor` tasks runs *after* the main set up tasks. - These tasks are not persisting any of their results in the database, instead the final `UploadFinisher` task will launch the `ParallelVerification` task. - Fully parallel processing (`PARALLEL`): - In this mode, the final `UploadFinisher` task is responsible for merging the final report and persisting it. An example Task chain might look like this, in "experiment" mode: - Upload - UploadProcessor - UploadProcessor - UploadProcessor (`EXPERIMENT_SERIAL` (the final one)) - UploadFinisher - UploadProcessor (`EXPERIMENT_PARALLEL`) - UploadProcessor (`EXPERIMENT_PARALLEL`) - UploadProcessor (`EXPERIMENT_PARALLEL`) - UploadFinisher (`EXPERIMENT_PARALLEL`) - ParallelVerification The `PARALLEL` mode looks like this: - Upload - UploadProcessor (`PARALLEL`) - UploadProcessor (`PARALLEL`) - UploadProcessor (`PARALLEL`) - UploadFinisher (`PARALLEL`)

Swatinem self-assigned this Aug 29, 2024

Swatinem force-pushed the swatinem/fully-parallel branch 3 times, most recently from 7635506 to b9f675a Compare September 10, 2024 11:10

Swatinem marked this pull request as ready for review September 10, 2024 11:10

Swatinem requested a review from a team September 10, 2024 11:10

Swatinem force-pushed the swatinem/fully-parallel branch from b9f675a to f1ab443 Compare September 10, 2024 11:37

michelletran-codecov reviewed Sep 10, 2024

View reviewed changes

matt-codecov reviewed Sep 12, 2024

View reviewed changes

Swatinem marked this pull request as draft September 12, 2024 08:32

Swatinem force-pushed the swatinem/fully-parallel branch 2 times, most recently from 739b6e5 to ad04519 Compare October 1, 2024 12:13

Swatinem marked this pull request as ready for review October 1, 2024 12:31

Swatinem requested review from matt-codecov and michelletran-codecov October 1, 2024 12:37

michelletran-codecov approved these changes Oct 1, 2024

View reviewed changes

tasks/upload.py Outdated Show resolved Hide resolved

Swatinem force-pushed the swatinem/fully-parallel branch from ad04519 to 7660ffc Compare October 2, 2024 14:17

matt-codecov approved these changes Oct 2, 2024

View reviewed changes

Swatinem force-pushed the swatinem/fully-parallel branch from 7660ffc to 9169799 Compare October 3, 2024 07:50

Swatinem added this pull request to the merge queue Oct 3, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 3, 2024

Swatinem force-pushed the swatinem/fully-parallel branch from 9169799 to 3ef30d7 Compare October 4, 2024 07:30

Swatinem added this pull request to the merge queue Oct 4, 2024

Merged via the queue into main with commit 98629d2 Oct 4, 2024
25 of 27 checks passed

Swatinem deleted the swatinem/fully-parallel branch October 4, 2024 07:50

		# When we are fully parallel, we need to update the `Upload` in the database
		# with the final session_id (aka `order_number`).

Implement fully parallel upload processing #658

Implement fully parallel upload processing #658

Uh oh!

Conversation

Swatinem commented Aug 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-notifications bot commented Aug 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codecov-qa bot commented Aug 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codecov-public-qa bot commented Aug 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codecov bot commented Aug 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

michelletran-codecov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

matt-codecov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

matt-codecov Sep 12, 2024

Choose a reason for hiding this comment

Uh oh!

Swatinem commented Sep 12, 2024

Uh oh!

matt-codecov commented Sep 12, 2024

Uh oh!

Swatinem commented Oct 1, 2024

Uh oh!

michelletran-codecov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

matt-codecov Oct 2, 2024

Choose a reason for hiding this comment

Uh oh!

Swatinem Oct 3, 2024

Choose a reason for hiding this comment

Uh oh!

matt-codecov Oct 2, 2024

Choose a reason for hiding this comment

Uh oh!

Swatinem Oct 3, 2024

Choose a reason for hiding this comment

Uh oh!

Swatinem Oct 3, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Swatinem commented Aug 29, 2024 •

edited

Loading

codecov-notifications bot commented Aug 29, 2024 •

edited

Loading

codecov-qa bot commented Aug 29, 2024 •

edited

Loading

codecov-public-qa bot commented Aug 29, 2024 •

edited

Loading

codecov bot commented Aug 29, 2024 •

edited

Loading