-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Have the same lock for upload + preprocess task #502
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,6 +24,7 @@ | |
from services.yaml import save_repo_yaml_to_database_if_needed | ||
from services.yaml.fetcher import fetch_commit_yaml_from_provider | ||
from tasks.base import BaseCodecovTask | ||
from tasks.upload import UPLOAD_LOCK_NAME | ||
|
||
log = logging.getLogger(__name__) | ||
|
||
|
@@ -48,7 +49,8 @@ def run_impl( | |
"Received preprocess upload task", | ||
extra=dict(repoid=repoid, commit=commitid, report_code=report_code), | ||
) | ||
lock_name = f"preprocess_upload_lock_{repoid}_{commitid}_{report_code}" | ||
lock_name = UPLOAD_LOCK_NAME(repoid, commitid) | ||
# lock_name = f"preprocess_upload_lock_{repoid}_{commitid}_{report_code}" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why keep it as a comment? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. caaaause I forgot 🙃 |
||
redis_connection = get_redis_connection() | ||
# This task only needs to run once per commit (per report_code) | ||
# To generate the report. So if one is already running we don't need another | ||
|
@@ -62,7 +64,10 @@ def run_impl( | |
with redis_connection.lock( | ||
lock_name, | ||
timeout=60 * 5, | ||
blocking_timeout=None, | ||
# This is the timeout that this task will wait to wait for the lock. This should | ||
# be non-zero as otherwise it waits indefinitely to get the lock, and ideally smaller than | ||
# the blocking timeout for the upload task so this one goes first, although it can go second | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not the order that matters. The reason that the blocking timeout should be smaller is to avoid the scenario in which the PreProcess tasks will wait forever and get the locks. This can drain Upload tasks (because they wait a limited time, and retry a limited amount of times). If only PreProcess tasks run for a commit that's a problem. If only Upload tasks run it's OK. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is also the reason why nuking the PreProcess task makes sense. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm kinda realizing now that the current setup could lead us into trouble... Consider this scenario: Upload waits 5s, doesn't get the lock. It retries (after 20s). Then it waits 5 more seconds to get the lock and fails. It will retry again (after 40s), and wait 5 more seconds. Then it will not retry again. In this scenario no Upload tasks would run for the commit and nothing would be processed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That makes sense, so if we just ensure the upload task waits at least 5 minutes in the worst case scenario, that would fix this right? Every unsuccessful run goes through blocking_timeout (0) + retry_countdown (20 * 2**self.request.retries), so it goes like 0 retries: 5 + 20 = 25 So if we'd allow this to retry at most 3 times, this could solve this, or we could tweak the specific values of retrying. That's one way. Another is to shorten the time during preprocess and fit the upload retries around that. There's the elimination of the preprocess task altogether, which I can ask other people/engs but I suspect that will be met with more friction + unforeseen things cause we're suddenly unsupporting a command. Although if we fundamentally think this task is more harmful than not, it's worth pushing back on getting rid of it. Let me know what you think about this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would push for the elimination of the preprocess task altogether. It would be a good idea to get other opinions from the team, definetely. It is not true that "we're suddenly unsupporting a command" if we drop that task. The api would still create an empty report. In the meantime we can go with the proposal in the other PR to alleviate customer's problems (although for the record I don't particularly like that solution, but that's just 1 opinion) |
||
blocking_timeout=3, | ||
): | ||
return self.process_impl_within_lock( | ||
db_session=db_session, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be useful to add a comment explaining why this task needs the same lock as the upload task