Skip to content

DEV: setup pixi #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 5, 2025
Merged

DEV: setup pixi #6

merged 5 commits into from
Apr 5, 2025

Conversation

lucascolley
Copy link
Member

@lucascolley lucascolley commented Nov 13, 2024

@lucascolley lucascolley force-pushed the pixi branch 4 times, most recently from 7c5bcc5 to faf3164 Compare November 13, 2024 20:53
@lucascolley lucascolley force-pushed the pixi branch 4 times, most recently from b90e9de to 8c71035 Compare April 4, 2025 21:52
pixi.toml Outdated

[feature.build.tasks.build-only]
# Just build without configure
cmd = ["cmake", "--build", "build", "-j2"]
Copy link
Collaborator

@steppi steppi Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -j2 is supposed to be specific to the CI environment. I saw SciPy hardcoded some -j2s for parallel execution in some of its workflows on Linux and copied that. I considered using $(nproc) in CI and maybe we should still do that. I'm not sure what the default should be. Locally I build with "-j24".

@steppi
Copy link
Collaborator

steppi commented Apr 4, 2025

Thanks for being proactive @lucascolley! I was planning to ping you about this to see if you wanted to pick it up again.

@lucascolley lucascolley marked this pull request as ready for review April 4, 2025 22:12
@lucascolley
Copy link
Member Author

should be ready from my side if CI is happy! BTW, I am consistently getting 2 failures locally:

 32/231 Test  #32: cyl_bessel_i dd->d scipy_special_tests .....................***Failed   27.28 sec
Filters: "cyl_bessel_i dd->d scipy_special_tests"
Randomness seeded to: 1662007158

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
scipy_special_tests_test_cyl_bessel_i is a Catch2 v3.8.0 host application.
Run with -? for options

-------------------------------------------------------------------------------
cyl_bessel_i dD->D scipy_special_tests
-------------------------------------------------------------------------------
/home/lucas/dev/xsf/./tests/scipy_special_tests/test_cyl_bessel_i.cpp:25
...............................................................................

/home/lucas/dev/xsf/./tests/scipy_special_tests/test_cyl_bessel_i.cpp:40: FAILED:
  REQUIRE( error <= tol )
with expansion:
  0.00000000000001591 <= 0.00000000000000645
with messages:
  v := 36.27060791878914614
  z := (-2.23743,0)
  out := (4.04936e-41,4.61082e-41)
  desired := (4.04936e-41,4.61082e-41)
  error := 0.00000000000001591
  tol := 0.00000000000000645
  fallback := false

===============================================================================
test cases:       2 |       1 passed | 1 failed
assertions: 2010919 | 2010918 passed | 1 failed
 35/231 Test  #33: cyl_bessel_i dD->D scipy_special_tests .....................***Failed   27.40 sec
Filters: "cyl_bessel_i dD->D scipy_special_tests"
Randomness seeded to: 1931919712

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
scipy_special_tests_test_cyl_bessel_i is a Catch2 v3.8.0 host application.
Run with -? for options

-------------------------------------------------------------------------------
cyl_bessel_i dD->D scipy_special_tests
-------------------------------------------------------------------------------
/home/lucas/dev/xsf/./tests/scipy_special_tests/test_cyl_bessel_i.cpp:25
...............................................................................

/home/lucas/dev/xsf/./tests/scipy_special_tests/test_cyl_bessel_i.cpp:40: FAILED:
  REQUIRE( error <= tol )
with expansion:
  0.00000000000001591 <= 0.00000000000000645
with messages:
  v := 36.27060791878914614
  z := (-2.23743,0)
  out := (4.04936e-41,4.61082e-41)
  desired := (4.04936e-41,4.61082e-41)
  error := 0.00000000000001591
  tol := 0.00000000000000645
  fallback := false

===============================================================================
test cases:       2 |       1 passed | 1 failed
assertions: 2010919 | 2010918 passed | 1 failed

@steppi
Copy link
Collaborator

steppi commented Apr 4, 2025

Thanks @lucascolley. At this point local failures are expected unless you're using 64 bit Linux with GCC. The tolerances are stored in parquet files and should be platform specific, but currently we only have tolerances for one platform. I think what I'll do is first write some logic that detects the platform and applies a looser tolerance to a default tol file if there's no platform specific tol file. That should give a short term fix to the annoyance of local failures at least.

@lucascolley
Copy link
Member Author

With Pixi in this PR I would expect CI to be using the same compilers as I am on Linux64 locally, so I'm not sure where the discrepancy is... but as long as CI is passing :)

@steppi
Copy link
Collaborator

steppi commented Apr 4, 2025

Oh, that's interesting. You're also using 64 bit Linux locally with GCC? What's your GCC version? Locally I have 11.4.0, which should be the same as in CI. It's just what ships with Ubuntu 22.04. The values in the parquet tolerance files are the (extended) relative error values that I got locally. When the actual tests are run, there's currently a 4x multiplier applied over this to get the tolerances and it looks like you're seeing more than another 2x on top of that. Probably some optimization that's messing with results. Stephen Moshier, the Cephes author told me that whenever we see an optimization messing with accuracy like this, we should consider it a compiler bug. Apparently he used to submit patches to compilers fixing things like this when he was actively developing Cephes.

@steppi
Copy link
Collaborator

steppi commented Apr 4, 2025

One thing. How can one pass a -j when building the tests and running them with pixi to enable parallelization? The speedup is significant on my machine. We can do this in the workflow to enable parallelization again there too.

@lucascolley
Copy link
Member Author

You can see in CI (https://github.com/scipy/xsf/actions/runs/14275292967/job/40017104296?pr=6) that it is using

gcc                               13.3.0        h9576a4e_2        54 KiB     conda  https://prefix.dev/conda-forge/
gcc_impl_linux-64                 13.3.0        h1e990d8_2        63.7 MiB   conda  https://prefix.dev/conda-forge/
gcc_linux-64                      13.3.0        hc28eda2_8        31.7 KiB   conda  https://prefix.dev/conda-forge/

which should be exactly reproducible locally on Linux with pixi run tests.

@lucascolley
Copy link
Member Author

lucascolley commented Apr 4, 2025

One thing. How can one pass a -j when building the tests and running them with pixi to enable parallelization? The speedup is significant on my machine. We can do this in the workflow to enable parallelization again there too.

pixi run --skip-deps tests -j24 (I think, untested!)

likewise pixi run configure-tests -j24

@steppi
Copy link
Collaborator

steppi commented Apr 4, 2025

Oh, doubly interesting. You also have GCC 13.3.0 locally but you're still seeing local failures?

@lucascolley
Copy link
Member Author

lucascolley commented Apr 4, 2025

I've switched off my linux box for the night, but if both in CI and locally the compilers installed by Pixi are used, then they should be identical (==what is specfied in pixi.lock). Perhaps either locally or in CI system GCC is being used for whatever reason.

@steppi
Copy link
Collaborator

steppi commented Apr 4, 2025

Thanks @lucascolley, I know it's very late in your timezone so please disregard until a more reasonable time.

That makes sense. I didn't realize pixi was installing a compiler. We'll want to make it easy use different compilers and test on different platforms. Right now pixi.lock is looking Linux specific but is it easy to incrementally go from here to make this work on different platforms? Sorry, it looks like it does work across platforms already. Very cool. Also, I see pixi supports macOS, Linux, and Windows according to its documentation. I think those are the only OS's we would test on, but it would be good if xsf could still be used conveniently on other OS's.

@steppi
Copy link
Collaborator

steppi commented Apr 4, 2025

I tried it out on my Mac mini and am getting weird type conversion bugs from the Parquet StreamReader. It's not related to this PR though and I think I know how to fix it.

@lucascolley
Copy link
Member Author

We'll want to make it easy use different compilers and test on different platforms

Yep very doable, see e.g. Ralf's work at https://github.com/rgommers/pixi-dev-scipystack/blob/main/scipy/pixi.toml which is a lot more complex than xsf.

Simple things like generating a matrix of envs with different GCC versions are completely trivial to do.

@steppi
Copy link
Collaborator

steppi commented Apr 5, 2025

pixi run configure-tests -j24

I tried it out. This was close. It should be

pixi run configure-tests
pixi run build-only -j24
pixi run --skip-deps tests -j24

Copy link
Collaborator

@steppi steppi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lucascolley. I've never used pixibefore, but it seems convenient so I'm happy to do this. I'm going to go ahead and merge this. We can iterate from here.

@steppi steppi merged commit 99ad371 into scipy:main Apr 5, 2025
1 check passed
Comment on lines +30 to +32
pixi run configure-tests
pixi run build-only -j2
pixi run tests -j2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not quite right, since xsref isn't cloned until the last step.

Suggested change
pixi run configure-tests
pixi run build-only -j2
pixi run tests -j2
pixi run clone-xsref
pixi run configure-tests
pixi run build-only -j2
pixi run --skip-deps tests -j2

@lucascolley lucascolley deleted the pixi branch April 5, 2025 07:08
@lucascolley
Copy link
Member Author

lucascolley commented Apr 5, 2025

I tried it out on my Mac mini and am getting weird type conversion bugs from the Parquet StreamReader. It's not related to this PR though and I think I know how to fix it.

I get these on osx-arm64 too. Looks like the problem is with

template <typename U>
void fill_element(U &element) {
if constexpr (std::is_same_v<U, std::complex<remove_complex_t<U>>>) {
using V = remove_complex_t<U>;
V real;
V imag;
*stream_ >> real >> imag;
element = U(real, imag);
} else {
*stream_ >> element;
}
}

@steppi
Copy link
Collaborator

steppi commented Apr 5, 2025

I tried it out on my Mac mini and am getting weird type conversion bugs from the Parquet StreamReader. It's not related to this PR though and I think I know how to fix it.

I get these on osx-arm64 too. Looks like the problem is with

template <typename U>
void fill_element(U &element) {
if constexpr (std::is_same_v<U, std::complex<remove_complex_t<U>>>) {
using V = remove_complex_t<U>;
V real;
V imag;
*stream_ >> real >> imag;
element = U(real, imag);
} else {
*stream_ >> element;
}
}

Good catch. Yes, that's the problem. It should look like this

    template <typename U>
    void fill_element(U &element) {
        if constexpr (std::is_same_v<U, std::complex<remove_complex_t<U>>>) {
            using V = remove_complex_t<U>;
            V real;
            V imag;
            *stream_ >> real >> imag;
            element = U(real, imag);
	} else if constexpr (std::is_same_v<U, std::ptrdiff_t>) {
	    std::int64_t val;
	    *stream_ >> val;
	    element = static_cast<std::ptrdiff_t>(val);
        } else{
            *stream_ >> element;
        }
    }

This is related to the NumPy default int stuff, numpy/numpy#9464, scipy/scipy#21401. The idea was to test the scalar kernels with the integer type corresponding to the NumPy default int. But the tables store the ints as int64, so if we want to test with ptrdiff_t we need to unpack into a 64 bit int first, and then cast to a ptrdiff_t. The bytes read from the table were getting out of alignment, so we were getting all kinds of error messages.

@steppi steppi mentioned this pull request Apr 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants