Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random test failure in sage.rings.polynomial.polynomial_element #39460

Open
kwankyu opened this issue Feb 6, 2025 · 11 comments
Open

Random test failure in sage.rings.polynomial.polynomial_element #39460

kwankyu opened this issue Feb 6, 2025 · 11 comments

Comments

@kwankyu
Copy link
Collaborator

kwankyu commented Feb 6, 2025

This is to track the current CI check failure seen in:

https://github.com/sagemath/sage/actions/runs/13165003478/job/36743502231?pr=39456

for example.

The root cause of the failure seems to be shown by:

On Ubuntu focal and jammy at least,

sage: N = matrix(Integers(7), 2, [1,2,3,4])
sage: N
[1 2]
[3 4]
sage: N.rank()
------------------------------------------------------------------------
/sage/local/var/lib/sage/venv-python3.12.5/lib/python3.12/site-packages/cysignals/signals.cpython-312-x86_64-linux-gnu.so(+0x9f84)[0x7ffffe8a7f84]
/sage/local/var/lib/sage/venv-python3.12.5/lib/python3.12/site-packages/cysignals/signals.cpython-312-x86_64-linux-gnu.so(+0xa041)[0x7ffffe8a8041]
/sage/local/var/lib/sage/venv-python3.12.5/lib/python3.12/site-packages/cysignals/signals.cpython-312-x86_64-linux-gnu.so(+0xd0e3)[0x7ffffe8ab0e3]
/lib/x86_64-linux-gnu/libc.so.6(+0x43090)[0x7ffffefa5090]
/sage/local/lib/libfflas.so.1(_ZN5FFLAS5fgemvIN6Givaro7ModularIffvEEEENT_11Element_ptrERKS4_NS_15FFLAS_TRANSPOSEEmmNS4_7ElementENS4_16ConstElement_ptrEmSA_mS9_S5_m+0x3e)[0x7fffaf28b64e]
/sage/src/sage/matrix/matrix_modn_dense_float.cpython-312-x86_64-linux-gnu.so(_ZN6FFPACK18PLUQ_basecaseCroutIN6Givaro7ModularIffvEEEEmRKT_N5FFLAS10FFLAS_DIAGEmmNS4_11Element_ptrEmPmSA_+0x24f)[0x7fffaf3fc03f]
/sage/local/lib/libffpack.so.1(_ZN6FFPACK4RankIN6Givaro7ModularIffvEEEEmRKT_mmNS4_11Element_ptrEm+0x76)[0x7fffaf1c2996]
/sage/src/sage/matrix/matrix_modn_dense_float.cpython-312-x86_64-linux-gnu.so(+0x59144)[0x7fffaf3f0144]
/sage/src/sage/matrix/matrix_modn_dense_float.cpython-312-x86_64-linux-gnu.so(+0x59c6b)[0x7fffaf3f0c6b]
/sage/local/var/lib/sage/venv-python3.12.5/lib/libpython3.12.so.1.0(+0x1ad291)[0x7fffff30a291]
/sage/local/var/lib/sage/venv-python3.12.5/lib/libpython3.12.so.1.0(PyObject_Vectorcall+0x55)[0x7fffff2c23c5]
/sage/local/var/lib/sage/venv-python3.12.5/lib/libpython3.12.so.1.0(_PyEval_EvalFrameDefault+0x48da)[0x7fffff26d68a]
/sage/local/var/lib/sage/venv-python3.12.5/lib/libpython3.12.so.1.0(PyEval_EvalCode+0xb5)[0x7fffff3a5f25]
/sage/local/var/lib/sage/venv-python3.12.5/lib/libpython3.12.so.1.0(+0x245cd8)[0x7fffff3a2cd8]
/sage/local/var/lib/sage/venv-python3.12.5/lib/libpython3.12.so.1.0(_PyEval_EvalFrameDefault+0x7a47)[0x7fffff2707f7]
/sage/local/var/lib/sage/venv-python3.12.5/lib/libpython3.12.so.1.0(+0x17b93d)[0x7fffff2d893d]
/sage/local/var/lib/sage/venv-python3.12.5/lib/libpython3.12.so.1.0(+0x17caf9)[0x7fffff2d9af9]
/sage/local/var/lib/sage/venv-python3.12.5/lib/libpython3.12.so.1.0(_PyEval_EvalFrameDefault+0xa944)[0x7fffff2736f4]
/sage/local/var/lib/sage/venv-python3.12.5/lib/libpython3.12.so.1.0(PyEval_EvalCode+0xb5)[0x7fffff3a5f25]
/sage/local/var/lib/sage/venv-python3.12.5/lib/libpython3.12.so.1.0(+0x2987bd)[0x7fffff3f57bd]
/sage/local/var/lib/sage/venv-python3.12.5/lib/libpython3.12.so.1.0(+0x2988c5)[0x7fffff3f58c5]
/sage/local/var/lib/sage/venv-python3.12.5/lib/libpython3.12.so.1.0(+0x2989cf)[0x7fffff3f59cf]
/sage/local/var/lib/sage/venv-python3.12.5/lib/libpython3.12.so.1.0(_PyRun_SimpleFileObject+0x12e)[0x7fffff3f87de]
/sage/local/var/lib/sage/venv-python3.12.5/lib/libpython3.12.so.1.0(_PyRun_AnyFileObject+0x3f)[0x7fffff3f8d2f]
/sage/local/var/lib/sage/venv-python3.12.5/lib/libpython3.12.so.1.0(Py_RunMain+0x99c)[0x7fffff41b8bc]
/sage/local/var/lib/sage/venv-python3.12.5/lib/libpython3.12.so.1.0(Py_BytesMain+0x3d)[0x7fffff41bd4d]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7ffffef86083]
python3(_start+0x2e)[0x55555555509e]
------------------------------------------------------------------------
Attaching gdb to process id 375.
/sage/local/var/lib/sage/venv-python3.12.5/bin/cysignals-CSI:86: DeprecationWarning: Use shutil.which instead of find_executable
  whichgdb = find_executable('gdb')
Cannot find gdb installed
GDB is not installed.
Install gdb for enhanced tracebacks.
------------------------------------------------------------------------
Unhandled SIGILL: An illegal instruction occurred.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.
------------------------------------------------------------------------
Illegal instruction

No crash with the SageMath-10-5 app on macOS:
│ SageMath version 10.5, Release Date: 2024-12-04
│ Using Python 3.12.5. Type "help()" for help.
sage: N = matrix(Integers(7), 2, [1,2,3,4])
sage: N
[1 2]
[3 4]
sage: N.rank()
2

  • Marc
@kwankyu
Copy link
Collaborator Author

kwankyu commented Feb 6, 2025

N.rank() is ultimately computed by the PLUQ routine defined in fflas-ffpack/ffpack/ffpack_pluq.inl file of the FFLAS-FFPACK package.

@kwankyu
Copy link
Collaborator Author

kwankyu commented Feb 6, 2025

fflas_ffpack 2.5.0 fails to install

[fflas_ffpack-2.5.0+sage-2024-05-18b] [spkg-install] ../autotune/tune_fgemm.sh
[fflas_ffpack-2.5.0+sage-2024-05-18b] [spkg-install] =================================================
[fflas_ffpack-2.5.0+sage-2024-05-18b] [spkg-install] ========= FFLAS-FFPACK fgemm Autotuning =========
[fflas_ffpack-2.5.0+sage-2024-05-18b] [spkg-install] =================================================
[fflas_ffpack-2.5.0+sage-2024-05-18b] [spkg-install] 
[fflas_ffpack-2.5.0+sage-2024-05-18b] [spkg-install] == Tuning fgemm over Modular<double> ==
[fflas_ffpack-2.5.0+sage-2024-05-18b] [spkg-install] 
[fflas_ffpack-2.5.0+sage-2024-05-18b] [spkg-install] ---------------------------------------------------------------------
[fflas_ffpack-2.5.0+sage-2024-05-18b] [spkg-install] Thu Feb  6 07:03:26 2025
[fflas_ffpack-2.5.0+sage-2024-05-18b] [spkg-install] 
[fflas_ffpack-2.5.0+sage-2024-05-18b] [spkg-install] Threshold for finite field Strassen-Winograd matrix multiplication (using Modular_implem<double, double, uint64_t> modulo 17)
[fflas_ffpack-2.5.0+sage-2024-05-18b] [spkg-install] 
[fflas_ffpack-2.5.0+sage-2024-05-18b] [spkg-install] fgemm:  n                   Classic                        Winograd 1 level
[fflas_ffpack-2.5.0+sage-2024-05-18b] [spkg-install]                     seconds            Gfops          seconds            Gfops
[fflas_ffpack-2.5.0+sage-2024-05-18b] [spkg-install] make[4]: *** [Makefile:975: autotune] Error 132
[fflas_ffpack-2.5.0+sage-2024-05-18b] [spkg-install] make[3]: *** [Makefile:1013: autotune] Error 2
[fflas_ffpack-2.5.0+sage-2024-05-18b] [spkg-install] Error tuning fflas-ffpack
[fflas_ffpack-2.5.0+sage-2024-05-18b] ::endgroup::
[fflas_ffpack-2.5.0+sage-2024-05-18b] ************************************************************************
[fflas_ffpack-2.5.0+sage-2024-05-18b] Error installing package fflas_ffpack-2.5.0+sage-2024-05-18b
[fflas_ffpack-2.5.0+sage-2024-05-18b] ************************************************************************
[fflas_ffpack-2.5.0+sage-2024-05-18b] Please email sage-devel (http://groups.google.com/group/sage-devel)
[fflas_ffpack-2.5.0+sage-2024-05-18b] explaining the problem and including the log files
[fflas_ffpack-2.5.0+sage-2024-05-18b]   /sage/logs/pkgs/fflas_ffpack-2.5.0+sage-2024-05-18b.log
[fflas_ffpack-2.5.0+sage-2024-05-18b] and
[fflas_ffpack-2.5.0+sage-2024-05-18b]   /sage/config.log
[fflas_ffpack-2.5.0+sage-2024-05-18b] Describe your computer, operating system, etc.
[fflas_ffpack-2.5.0+sage-2024-05-18b] If you want to try to fix the problem yourself, *don't* just cd to
[fflas_ffpack-2.5.0+sage-2024-05-18b] /sage/local/var/tmp/sage/build/fflas_ffpack-2.5.0+sage-2024-05-18b and type 'make' or whatever is appropriate.
[fflas_ffpack-2.5.0+sage-2024-05-18b] Instead, the following commands setup all environment variables
[fflas_ffpack-2.5.0+sage-2024-05-18b] correctly and load a subshell for you to debug the error:
[fflas_ffpack-2.5.0+sage-2024-05-18b]   (cd '/sage/local/var/tmp/sage/build/fflas_ffpack-2.5.0+sage-2024-05-18b' && '/sage/sage' --buildsh)
[fflas_ffpack-2.5.0+sage-2024-05-18b] When you are done debugging, you can type "exit" to leave the subshell.
[fflas_ffpack-2.5.0+sage-2024-05-18b] ************************************************************************
[fflas_ffpack-2.5.0+sage-2024-05-18b] real 1m2.664s user 3m11.780s sys 0m6.704s
::endgroup::
make[2]: *** [Makefile:3421: fflas_ffpack-SAGE_LOCAL-no-deps] Error 1
make[1]: *** [Makefile:3421: /sage/local/var/lib/sage/installed/fflas_ffpack-2.5.0+sage-2024-05-18b] Error 2
make[1]: Leaving directory '/sage/build/make'
***************************************************************
Error building Sage.

The following package(s) may have failed to build (not necessarily
during this run of 'make fflas_ffpack'):

* package:         fflas_ffpack-2.5.0+sage-2024-05-18b
  last build time: Feb 6 07:03
  log file:        /sage/logs/pkgs/fflas_ffpack-2.5.0+sage-2024-05-18b.log
  build directory: /sage/local/var/tmp/sage/build/fflas_ffpack-2.5.0+sage-2024-05-18b

It is safe to delete any log files and build directories, but they
contain information that is helpful for debugging build problems.
WARNING: If you now run 'make' again, the build directory of the
same version of the package will, by default, be deleted. Set the
environment variable SAGE_KEEP_BUILT_SPKGS=yes to prevent this.

real 1m7.283s user 3m17.483s sys 0m7.027s
make: *** [Makefile:40: fflas_ffpack] Error 1
(sage-buildsh) root@f873c0a48959:sage$ ./sage --package properties fflas_ffpack
        path:                        /sage/build/pkgs/fflas_ffpack
        version_with_patchlevel:     2.5.0+sage-2024-05-18b
        type:                        standard
        source:                      normal
        trees:                       SAGE_LOCAL
        purl:                        pkg:generic/fflas-ffpack

For the sage in the CI, the system fflas_ffpack is used .

@kwankyu
Copy link
Collaborator Author

kwankyu commented Feb 6, 2025

@ClementPernet any help?

@kwankyu
Copy link
Collaborator Author

kwankyu commented Feb 6, 2025

If installing fflas-ffpack spkg is forced bypassing autotuning, then the bug disappears!

@kwankyu
Copy link
Collaborator Author

kwankyu commented Feb 6, 2025

So ignoring system fflas-ffpack on ubuntu and installing the spkg bypassing autotuning will fix the bug and also the CI failure.

@dimpase @vbraun Is this reasonable?

@fchapoton
Copy link
Contributor

This works fine for me (no bug) with

│ SageMath version 10.6.beta5, Release Date: 2025-01-26              │
│ Using Python 3.12.3. Type "help()" for help.  

on ubuntu 24.04.

@dimpase
Copy link
Member

dimpase commented Feb 6, 2025

well, fixing CI surely can be done this way.

OTOH fflas_ffpack was released more than 4 years ago, no wonder it breaks on a much newer OS/compiler

@nbruin
Copy link
Contributor

nbruin commented Feb 6, 2025

Doesn't an illegal instruction usually indicate that the code has been compiled for an instruction set that includes an instruction that the actual hardware doesn't support? FFPACK is supposed to be a highly optimized library, right? And for linear algebra, I could see room for using fancy vector instructions. Could it be just a matter that Ubuntu compiled its FFPACK a little enthusiastically in terms of expected architecture and that their standard version happens to assume an instruction that is not available on the CI hardware? Or perhaps not on some CI hardware?

It doesn't need to "break" on a newer OS/compiler for this to happen. It could even be a compiler getting smarter and using more of the instruction set available on the stated target; this needing the target to be specified more precisely.

I think it mainly points to Ubuntu packaging a miscompiled library and/or github using a ubuntu version that is inappropriate for (some of) the hardware.

@dimpase
Copy link
Member

dimpase commented Feb 6, 2025

an illegal instruction usually indicate that the code has been compiled for an instruction set that includes an instruction that the actual hardware doesn't support

yes, indeed.

@kwankyu
Copy link
Collaborator Author

kwankyu commented Feb 6, 2025

... I think it mainly points to Ubuntu packaging a miscompiled library and/or github using a ubuntu version that is inappropriate for (some of) the hardware.

I see. It seems to explain the situation that the ubuntu fflas-ffpack package is compiled for newer instruction set but the hardware (of github ci runner) fails to support it. I wondered why a similar sage docker image works well on WSL on my bare metal.

I am not sure about a solution yet...

@kwankyu kwankyu changed the title A random doctest failure in Build & Test CI check A random doctest failure in sage.rings.polynomial.polynomial_element Feb 7, 2025
@kwankyu kwankyu changed the title A random doctest failure in sage.rings.polynomial.polynomial_element Random test failure in sage.rings.polynomial.polynomial_element Feb 7, 2025
@kwankyu
Copy link
Collaborator Author

kwankyu commented Feb 7, 2025

As a solution for these chronic test failures, I created #39470 (and accompanying #39471).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants