-
Notifications
You must be signed in to change notification settings - Fork 131
Combining cron-staging.yml and main.yml #1598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combining cron-staging.yml and main.yml #1598
Conversation
Hi @wshanks this was one of the issue which I was assigned to a few years back and was not able to fix ,I am now open for any feedback on the PR Related to the issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @AbdullahKazi500! I read through the new workflow and it looks good. I just had a couple small comments.
There is something strange about the CI here. Maybe it is just unique to this PR. I have to approve the tests to start running but that seems to have started a second run while some first run it stuck waiting approval to run. The second run's jobs get canceled because the first run's jobs have higher priority according to the message in the UI. |
Hi Yeah, what you’re seeing is expected GitHub Actions behaviour, but it does look confusing at first glance. I believe that it is happening due to the following reasons
In my workflow if you see I have defined:
This means only one workflow run per branch/PR is allowed at a time. When you approved the run, GitHub scheduled a new run (because the approval counts as a “re-run”), while the earlier “waiting for approval” run is still holding the concurrency lock. As a result, the new run gets cancelled in favour of the first one, even though the first run is stuck until you approve it. Why it looks like a loop Essentially, GitHub keeps trying to create a new run when you approve, but concurrency gives priority to the oldest pending run. That leaves you with one “zombie” workflow blocking the new one. there are some few Options to fix / avoid this: Approve immediately instead of triggering a new run; the first pending run should proceed. Cancel the older “waiting for approval” run before approving a new one. That clears the concurrency lock. Loosen the concurrency group definition if this behaviour becomes a recurring problem. For example, you could remove ${{ github.head_ref }} so that fork PRs don’t get stuck in this loop. Alternatively, disable cancel-in-progress but that means multiple runs per PR could execute simultaneously, which might not be what you want. |
Okay, I got it straightened out to run this time at least, but it failed because your conditional syntax didn't work on powershell. By the way, is there any reason not to remove the main.yml and cron-staging.yml files as part of this PR? Removing main.yml now would be nice so that we don't have to run its tests before we can run ci.yml's tests. |
Oh hey yes I would do that sorry for replying late I was attending a conference |
I have deleted the main.yml in my forked branch |
maybe due to the step used bash-only conditionals inside a run: block: maybe that's why |
Can you also delete
Yes, I think that's why. Do you want to split the docs step into two steps using opposite |
Okay did that as well now looking into shell issue |
Hi I made some changes to the PR BTW Do you want me to also apply the same qiskitmain cache separation idea to the stestr cache in tests, or should we leave that unified since it’s small compared to the uv cache? |
Nice! Everything is passing now. A quick review of what I think the caching is doing for us: From what I understand, the uv cache saves the wheels whenever uv tries to install something that only has an sdist and so it has to build a wheel locally before installing. For the Qiskit main jobs, we are building Qiskit from source and so it gets into the uv cache. When I look at the cache sizes in the GitHub Actions UI, I see that the Qiskit main caches are around 380 MB while the others are only 8 MB. 380 MB is much bigger than the Qiskit main wheel, so I am not sure what all is happening there but it seems better to avoid restoring the 380 MB for the runs that are not building Qiskit from source. For the stestr cache, what is being stored is how long each test takes to run. stestr runs the tests in parallel and tries to split them up across different workers so that each worker will take about the same amount of time (and one worker doesn't get stuck with all the slow tests, slowing down the total run time). I don't expect the run time to change that much for Qiskit main compared to the released version of Qiskit, so the two cases should be able to use the same cache. I think what you have right now is not quite the right behavior though. You are always restoring the cache but only pruning it for the non Qiskit main case. That means that each Qiskit main run will grow the stestr cache. We only need to keep the times from the most recent run, so we should always prune the history to keep it from growing bigger than necessary. One thing I think we could do is follow this pattern of using restore and save separately to always restore the cache but only save it for the non Qiskit main runs, if we wanted to avoid Qiskit main runs having slightly different run times and the main run times leading to non-optimal test distribution for the release version runs. That might be overkill though. |
Hi I have updated the PR .One question Do you want me to also apply the same restore/save strategy to the uv cache for Qiskit main, so that we never save those 380 MB caches? Or should we keep them since they might still be useful for repeated main builds? |
I have tried this fix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this all looks great now, @AbdullahKazi500! Thanks for all your work on this.
The PR flow looks like it is running well. I will merge now and try trigger the Qiskit main flow. If there are any adjustments needed, we can do them in a follow up. It is hard to test that part without merging.
sure Thank you for the opportunity to work on this PR. This was going to be my first contribution to Qiskit, and although it took me almost three years to get back to it, I’m definitely open to tackling other issues in the repository. If you don’t mind me asking, I’ve noticed over the years that the maintainers and reviewers for the experiments package seem less active. Back then, contributors like Helena, Kanazawa, Yael, and others were very involved, whereas now I mostly see your commits. Are there any plans for migration or changes to the package’s maintenance in the near future? |
Summary
Details
Merges duplicated logic between cron and main workflows.
Uses conditional if: checks to distinguish between schedule runs and regular PR/merge queue runs.
Removes unnecessary push trigger (superseded by merge_group).
Keeps merge_group as the required status check for branch protection.
Runs:
Normal py tests on PRs and merge queue.
qiskit-main tests and docs-qiskit-main build on scheduled cron runs.
Skips linting on cron runs to avoid redundant checks.
Unifies documentation build and artifact upload steps.
Motivation
Simplifies CI by reducing duplication.
Aligns with new merge queue workflow.
Removes legacy testing on qiskit-ibmq-provider (now deprecated).
PR checklist (delete when all criteria are met)
CONTRIBUTING.md
.reno
if this change needs to be documented in the release notes.Fixes Streamline CI workflow #1073