Replace runs-on.com with ec2-github-runner for GPU CI by jack-champagne · Pull Request #7 · harmoniqs/CuQuantum.jl

jack-champagne · 2026-04-10T01:48:20Z

Summary

Delete .runs-on.yml (no longer using runs-on.com service)
Rewrite gpu-test as three-job pattern: start-gpu-runner → gpu-test → stop-gpu-runner
GPU tests now run on PRs (non-fork) and main pushes, not just main
Add gpu-benchmark.yml for on-demand benchmarks (T4, A10G, A100, H100)

Dependencies

harmoniqs/aws-infra#23 must be deployed first (creates IAM roles + security group)
GitHub secrets must be configured: GH_RUNNER_PAT, AWS_GPU_RUNNER_ROLE_ARN, AWS_GPU_RUNNER_SUBNET_ID, AWS_GPU_RUNNER_SG_ID, AWS_GPU_RUNNER_AMI_ID, AWS_GPU_RUNNER_INSTANCE_PROFILE

Test plan

aws-infra#23 merged and applied to staging/prod
GitHub secrets configured from terraform outputs
Classic PAT created with repo scope
Open test PR to verify GPU runner spins up, tests pass, instance terminates
Verify if: always() cleanup on failure

…line Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add Aqua and JET to [extras]/[targets], create test/aqua.jl with all checks enabled (ambiguities disabled for CUDA.jl noise), add compat entries for all extras, and suppress known stale-dep false positives. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…m 0 to 1 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace anonymous NamedTuple refs with a concrete mutable CallbackRef type that registers a finalizer to automatically unregister both forward and gradient callbacks from the global registry when GC'd or explicitly closed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…/catch finalizer safety Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace Any-typed GC anchor fields in ElementaryOperator, MatrixOperator, OperatorTerm, Operator, and WorkStream with concrete typed fields. Move callbacks.jl include before operators.jl so CallbackRef is in scope. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…meterize dtype/batch_size tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ase 6 gap Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…sion Added docstring for CUDENSITYMATError and all destroy_* functions that lacked them. Added export statements for the utility functions in state.jl, the batch/append/prepare/compute functions in operators.jl, and the prepare/compute functions in expectation.jl and spectrum.jl. Removed warnonly = [:missing_docs] from docs/make.jl now that all exported symbols are documented. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… sync explicitly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ack docstrings

Add .runs-on.yml defining a 'gpu' runner profile (g4dn.xlarge, T4, ubuntu22-gpu-x64, spot, 45 min timeout) and update the gpu-test CI job to use it via the runs-on label syntax. Add JULIA_CUDA_MEMORY_POOL=none and CUDA_VISIBLE_DEVICES='0' env vars. Job remains gated to main/tag pushes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Delete .runs-on.yml (no longer using runs-on.com service) - Rewrite gpu-test as three-job pattern: start-gpu-runner, gpu-test, stop-gpu-runner - GPU tests now run on PRs (non-fork) and main pushes, not just main - Add gpu-benchmark.yml for on-demand benchmarks (T4 through H100) - Fork PR protection via repo name check on start-gpu-runner

jack-champagne · 2026-04-10T02:08:23Z

Superseded by direct merge from clean branch off main

jack-champagne and others added 16 commits April 3, 2026 01:43

chore: replace JuliaFormatter with Runic.jl and apply formatting base…

ba96381

…line Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: extend LinearAlgebra.norm/tr/dot instead of shadowing them

1446f47

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: allow LinearAlgebra.tr on both pure and mixed states

8bcf7ea

fix: remove CUDA.device!() side effect; change batch_size default fro…

61cabc4

…m 0 to 1 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: add Base.close/isopen and _destroy! to all handle types with try…

83ad074

…/catch finalizer safety Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

style: use finalizer(_destroy!, obj) pattern consistently in WorkStream

2b8b938

test: fix TEST_BATCH_SIZES (remove 0), add sync_and_pull helper, para…

fd95384

…meterize dtype/batch_size tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test: fix trajectory.csv to use tempdir, add batch API tests, fill Ph…

8d7a709

…ase 6 gap Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

perf: remove eager CUDA.synchronize() from compute functions; callers…

89b20c6

… sync explicitly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: correct trajectory print message; name CallbackRef in wrap_callb…

a927b4b

…ack docstrings

jack-champagne closed this Apr 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace runs-on.com with ec2-github-runner for GPU CI#7

Replace runs-on.com with ec2-github-runner for GPU CI#7
jack-champagne wants to merge 16 commits into
mainfrom
jc/gpu-runner-ec2

jack-champagne commented Apr 10, 2026

Uh oh!

jack-champagne commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jack-champagne commented Apr 10, 2026

Summary

Dependencies

Test plan

Uh oh!

jack-champagne commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant