Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevents race condition from multiple builds attempting to use/delete the same julia install #3662

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jagoosw
Copy link
Collaborator

@jagoosw jagoosw commented Jul 25, 2024

See #3661

@jagoosw
Copy link
Collaborator Author

jagoosw commented Jul 25, 2024

I'm not sure how this works for distributed tests but I don't think anything needs to change there

@jagoosw
Copy link
Collaborator Author

jagoosw commented Jul 25, 2024

I think everything got cancelled rather than failing but not sure why...

@glwagner
Copy link
Member

.buildkite/pipeline.yml Outdated Show resolved Hide resolved
.buildkite/pipeline.yml Outdated Show resolved Hide resolved
@glwagner
Copy link
Member

suggested some changes (otherwise we get a bunch of folders like "16574", "16344" with no other indication so I added a label)

Co-authored-by: Gregory L. Wagner <[email protected]>
@glwagner
Copy link
Member

This doesn't seem to fix the problem: https://buildkite.com/clima/oceananigans/builds/16499#0190eab5-ee7c-47da-8bb6-f0026a31e110

Although I was hopeful. But perhaps having separate depots is enough...

One issue could be that init is not doing enough somehow. So additional actions are taken after init that conflict with each other.

@glwagner
Copy link
Member

I wonder if we are not instantiating the test environment correctly.

We do instantiate here:

- "$SVERDRUP_HOME/julia-$JULIA_VERSION/bin/julia -O0 --color=yes --project -e 'using Pkg; Pkg.instantiate(; verbose=true)'"

but the segmentation fault is coming from Pkg:

Stacktrace:
--
  | [1] pipeline_error
  | @ ./process.jl:565 [inlined]
  | [2] read(cmd::Cmd)
  | @ Base ./process.jl:449
  | [3] collect_artifacts(pkg_root::String; platform::Base.BinaryPlatforms.Platform)
  | @ Pkg.Operations /net/ocean/home/data44/data5/glwagner/oceananigans-buildkite-16499/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:720
  | [4] collect_artifacts
  | @ /net/ocean/home/data44/data5/glwagner/oceananigans-buildkite-16499/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:706 [inlined]
  | [5] download_artifacts(env::Pkg.Types.EnvCache; platform::Base.BinaryPlatforms.Platform, julia_version::VersionNumber, verbose::Bool, io::Base.DevNull)
  | @ Pkg.Operations /net/ocean/home/data44/data5/glwagner/oceananigans-buildkite-16499/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:752
  | [6] up(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}, level::Pkg.Types.UpgradeLevel; skip_writing_project::Bool, preserve::Nothing)
  | @ Pkg.Operations /net/ocean/home/data44/data5/glwagner/oceananigans-buildkite-16499/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:1542
  | [7] up(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; level::Pkg.Types.UpgradeLevel, mode::Pkg.Types.PackageMode, preserve::Nothing, update_registry::Bool, skip_writing_project::Bool, kwargs::@Kwargs{io::Base.DevNull})
  | @ Pkg.API /net/ocean/home/data44/data5/glwagner/oceananigans-buildkite-16499/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/API.jl:351
  | [8] up
  | @ /net/ocean/home/data44/data5/glwagner/oceananigans-buildkite-16499/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/API.jl:326 [inlined]
  | [9] up
  | @ /net/ocean/home/data44/data5/glwagner/oceananigans-buildkite-16499/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/API.jl:164 [inlined]
  | [10] #resolve#143
  | @ /net/ocean/home/data44/data5/glwagner/oceananigans-buildkite-16499/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/API.jl:357 [inlined]
  | [11] resolve
  | @ /net/ocean/home/data44/data5/glwagner/oceananigans-buildkite-16499/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/API.jl:356 [inlined]

There may be some clues here: https://discourse.julialang.org/t/how-do-i-activate-the-test-environment-of-a-package/86740/3

but we may also want to reach out on the Julia slack

@glwagner
Copy link
Member

The main downside of this PR is that our CI will get clogged up more frequently (sometimes the servers get full, and we have to go in and manually delete the builds that were not deleted during clean up). But we could consider merging this if it seems to help, even if it does not fully solve the issue. Perhaps @navidcy and @simone-silvestri have thoughts.

@simone-silvestri
Copy link
Collaborator

I think to do this cleanly we might want to follow up with what @vchuravy was doing here #3042

@simone-silvestri
Copy link
Collaborator

or at least we could try following that route (it seems like there were some problems also with that implementation)

@jagoosw
Copy link
Collaborator Author

jagoosw commented Aug 2, 2024

I think to do this cleanly we might want to follow up with what @vchuravy was doing here #3042

I tried experimenting with this on the OceanBioME tests and it seems to be preventing it from segfaultinig OceanBioME/OceanBioME.jl#190 (and testing running them both at the same time here OceanBioME/OceanBioME.jl#196)

But it does also seem to run quite a bit slower so I'm going to see how much I can get it to cache without it causing problems.

And then I can copy the implementation over to here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants