Skip to content

fix ReFrame issues on NVIDIA Grace #988

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: 2023.06-software.eessi.io
Choose a base branch
from

Conversation

trz42
Copy link
Collaborator

@trz42 trz42 commented Mar 28, 2025

Let's see if CI for eessi_container.sh is fixed by installing another package in install_apptainer_ubuntu.sh, and also if ReFrame tests succeed.

We should also add some package to be built to test if building still works after running containers with the additional argument --contain.

truib added 2 commits March 28, 2025 22:19

Verified

This commit was signed with the committer’s verified signature.
scala-steward Scala Steward
@trz42 trz42 added bug Something isn't working tests Related to software testing 2023.06-software.eessi.io 2023.06 version of software.eessi.io labels Mar 28, 2025
Copy link

eessi-bot bot commented Mar 28, 2025

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/sapphirerapids, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-software, eessi.io-2023.06-compat

Copy link

eessi-bot bot commented Mar 28, 2025

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-compat, eessi.io-2023.06-software

@eessi-bot-trz42
Copy link

Instance trz42-GH200-jr is configured to build for:

  • architectures: aarch64/nvidia/grace
  • repositories: eessi.io-2023.06-software

@eessi-bot-toprichard
Copy link

Instance rt-Grace-jr is configured to build for:

  • architectures: aarch64/nvidia/grace
  • repositories: eessi.io-2023.06-software

@trz42 trz42 changed the title fix CI and ReFrame issues on NVIDIA Grace fix ReFrame issues on NVIDIA Grace Mar 28, 2025
@trz42
Copy link
Collaborator Author

trz42 commented Mar 28, 2025

bot: build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace

Copy link

eessi-bot bot commented Mar 28, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace from trz42

    • expanded format: build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace
  • handling command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Mar 28, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace from trz42

    • expanded format: build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace
  • handling command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace resulted in:

    • no jobs were submitted

@eessi-bot-toprichard
Copy link

Updates by the bot instance rt-Grace-jr (click for details)
  • account trz42 has NO permission to send commands to the bot

@eessi-bot-trz42
Copy link

eessi-bot-trz42 bot commented Mar 28, 2025

Updates by the bot instance trz42-GH200-jr (click for details)
  • received bot command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace from trz42

    • expanded format: build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace
  • handling command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace resulted in:

@eessi-bot-trz42
Copy link

eessi-bot-trz42 bot commented Mar 28, 2025

New job on instance trz42-GH200-jr for CPU micro-architecture aarch64-nvidia-grace for repository eessi.io-2023.06-software in job dir /p/project1/ceasybuilders/bot-trz42/jobs/2025.03/pr_988/13545111

date job status comment
Mar 28 22:10:42 UTC 2025 submitted job id 13545111 awaits release by job manager
Mar 28 22:10:50 UTC 2025 released job awaits launch by Slurm scheduler
Mar 28 22:11:53 UTC 2025 running job 13545111 is running
Mar 28 22:42:45 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-13545111.out
❌ found message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Mar 28 22:42:45 UTC 2025 test result
🤷 UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job13545111.test does not exist in job directory, or parsing it failed.

@trz42
Copy link
Collaborator Author

trz42 commented Mar 28, 2025

Rebuild...
bot: build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace

Copy link

eessi-bot bot commented Mar 28, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace from trz42

    • expanded format: build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace
  • handling command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Mar 28, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace from trz42

    • expanded format: build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace
  • handling command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace resulted in:

    • no jobs were submitted

@eessi-bot-toprichard
Copy link

Updates by the bot instance rt-Grace-jr (click for details)
  • account trz42 has NO permission to send commands to the bot

@eessi-bot-trz42
Copy link

eessi-bot-trz42 bot commented Mar 28, 2025

Updates by the bot instance trz42-GH200-jr (click for details)
  • received bot command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace from trz42

    • expanded format: build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace
  • handling command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace resulted in:

@eessi-bot-trz42
Copy link

eessi-bot-trz42 bot commented Mar 28, 2025

New job on instance trz42-GH200-jr for CPU micro-architecture aarch64-nvidia-grace for repository eessi.io-2023.06-software in job dir /p/project1/ceasybuilders/bot-trz42/jobs/2025.03/pr_988/13545157

date job status comment
Mar 28 22:42:45 UTC 2025 submitted job id 13545157 awaits release by job manager
Mar 28 22:43:49 UTC 2025 released job awaits launch by Slurm scheduler
Mar 28 22:44:52 UTC 2025 running job 13545157 is running
Mar 28 23:14:38 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-13545157.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-nvidia-grace-1743202985.tar.gzsize: 0 MiB (426755 bytes)
entries: 52
modules under 2023.06/software/linux/aarch64/nvidia/grace/modules/all
BWA/0.7.18-GCCcore-13.2.0.lua
software under 2023.06/software/linux/aarch64/nvidia/grace/software
BWA/0.7.18-GCCcore-13.2.0
other under 2023.06/software/linux/aarch64/nvidia/grace
no other files in tarball
Mar 28 23:14:38 UTC 2025 test result
🤷 UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job13545157.test does not exist in job directory, or parsing it failed.

truib added 2 commits March 29, 2025 00:13
…-layer into fix_CI_and_reframe_issues_on_grace
@trz42
Copy link
Collaborator Author

trz42 commented Mar 28, 2025

Rebuild after adding /dev to extra bind paths...
bot: build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace

Copy link

eessi-bot bot commented Mar 28, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace from trz42

    • expanded format: build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace
  • handling command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Mar 28, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace from trz42

    • expanded format: build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace
  • handling command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace resulted in:

    • no jobs were submitted

@eessi-bot-toprichard
Copy link

Updates by the bot instance rt-Grace-jr (click for details)
  • account trz42 has NO permission to send commands to the bot

@eessi-bot-trz42
Copy link

eessi-bot-trz42 bot commented Mar 28, 2025

Updates by the bot instance trz42-GH200-jr (click for details)
  • received bot command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace from trz42

    • expanded format: build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace
  • handling command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace resulted in:

@eessi-bot-trz42
Copy link

eessi-bot-trz42 bot commented Mar 28, 2025

New job on instance trz42-GH200-jr for CPU micro-architecture aarch64-nvidia-grace for repository eessi.io-2023.06-software in job dir /p/project1/ceasybuilders/bot-trz42/jobs/2025.03/pr_988/13545169

date job status comment
Mar 28 23:16:59 UTC 2025 submitted job id 13545169 awaits release by job manager
Mar 28 23:17:42 UTC 2025 released job awaits launch by Slurm scheduler
Mar 28 23:18:46 UTC 2025 running job 13545169 is running
Mar 28 23:56:45 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-13545169.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-nvidia-grace-1743204932.tar.gzsize: 0 MiB (428313 bytes)
entries: 52
modules under 2023.06/software/linux/aarch64/nvidia/grace/modules/all
BWA/0.7.18-GCCcore-13.2.0.lua
software under 2023.06/software/linux/aarch64/nvidia/grace/software
BWA/0.7.18-GCCcore-13.2.0
other under 2023.06/software/linux/aarch64/nvidia/grace
no other files in tarball
Mar 28 23:56:45 UTC 2025 test result
🤷 UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job13545169.test does not exist in job directory, or parsing it failed.

…-layer into fix_CI_and_reframe_issues_on_grace
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2023.06-software.eessi.io 2023.06 version of software.eessi.io bug Something isn't working tests Related to software testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants