Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use GitHub-hosted aarch64 runners #2202

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

mathbunnyru
Copy link
Member

Describe your changes

Issue ticket if applicable

Fix: #2201
Fix: #2198

Checklist (especially for first-time contributors)

  • I have performed a self-review of my code
  • If it is a core feature, I have added thorough tests
  • I will try not to use force-push to make the review process easier for reviewers
  • I have updated the documentation for significant changes

@manics
Copy link
Contributor

manics commented Jan 16, 2025

There are some bugs in the ubuntu-24.04-arm image- some environment variables are incorrect:
jupyterhub/zero-to-jupyterhub-k8s#3605
I don't know if this is affecting your builds.

@consideRatio
Copy link
Collaborator

Beautiful!!! ❤️ 🎉

@consideRatio
Copy link
Collaborator

A job failed in this PR, but not in the main branch - a difference between these jobs is the test runner's python version, which is 3.10 (main branch) vs 3.13 (in this PR) ---

image

image

Then in the x86_64 test from main branch and this PR, the Python test runner version is 3.12:

image


Maybe the Python version in the aarch image is floating around a bit? Perhaps because the self-hosted aarch runners have a 3.x mapped to 3.10, while the new aarch runners have a 3.x mapped to 3.13, and the x86 runners have 3.x mapped to 3.12?

I figure the fix could be to pin to 3.12

@consideRatio
Copy link
Collaborator

consideRatio commented Jan 18, 2025

I pushed a commit to see if it helps to pin to 3.12!

Edit: it seems so, or it was a coincidence - now it works at least!

@mathbunnyru
Copy link
Member Author

Thanks @consideRatio!
Let's try to restart it a few times to see if it works.

@mathbunnyru mathbunnyru reopened this Jan 19, 2025
@mathbunnyru
Copy link
Member Author

zstd: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? - I don't think Docker is stable enough in these runners (and I suspect that might have been the problem with the previous failure as well).

@consideRatio
Copy link
Collaborator

Oh, hmmm, i wonder if it could be a timing thing - were the CI scripts started faster than docker daemon in arm64 compared to in x64.

These things are messy to debug :/

@mathbunnyru
Copy link
Member Author

Oh, hmmm, i wonder if it could be a timing thing - were the CI scripts started faster than docker daemon in arm64 compared to in x64.

These things are messy to debug :/

I think we have quite a nice example of this happening!

This only runs docker build and nothing else.
https://github.com/jupyter/docker-stacks/actions/runs/12999905803/job/36256210849?pr=2202

@mathbunnyru
Copy link
Member Author

I added a sleep 10 - let's see if this eliminates the problem

@mathbunnyru
Copy link
Member Author

Wow, it seems that even @actions/checkout can fail (possibly for the same reason): https://github.com/jupyter/docker-stacks/actions/runs/13000122202/job/36256874950?pr=2202#step:3:20

And we can't have this workaround in create-dev-env (because we need to checkout first)

@@ -41,7 +41,7 @@ jobs:
steps:
# Image with CUDA needs extra disk space
- name: Free disk space 🧹
if: contains(inputs.variant, 'cuda') && inputs.platform == 'x86_64'
if: contains(inputs.variant, 'cuda') && runner.arch == 'X64'
Copy link
Member Author

@mathbunnyru mathbunnyru Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use inputs.platform when we want to name things.
runner.arch is better in this case - it's standard when using GitHub runners

@mathbunnyru mathbunnyru reopened this Jan 28, 2025
@mathbunnyru
Copy link
Member Author

It seems that GitHub-hosted aarch64 runners aren't currently stable enough. So, let's wait for a while, I hope it will be better.

@mathbunnyru
Copy link
Member Author

The relevant discussions: https://github.com/orgs/community/discussions/148648

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Linux arm64 hosted runners now available for free 😃 Julia 1.11.2 hangs the aarch64 builds
3 participants