Skip to content

GH-49321: [C++][Python] Add ASAN / UBSAN pixi builds for Arrow and PyArrow#49849

Open
raulcd wants to merge 22 commits intoapache:mainfrom
raulcd:GH-49321
Open

GH-49321: [C++][Python] Add ASAN / UBSAN pixi builds for Arrow and PyArrow#49849
raulcd wants to merge 22 commits intoapache:mainfrom
raulcd:GH-49321

Conversation

@raulcd
Copy link
Copy Markdown
Member

@raulcd raulcd commented Apr 23, 2026

Rationale for this change

There has been discussions and appettite in the past to run Address and Undefined Behaviour Sanitizers on PyArrow.
In order to runn Sanitizers we have to instrument our build to build CPython and Numpy with Sanitizer flags too.

What changes are included in this PR?

This PR takes inspiration from other tools that have started intrumenting their sanitizer builds with Pixi like:

We add 4 new jobs and docker builds for:

  • test-pixi-cpp (builds and tests Arrow CPP with pixi-build)
  • test-pixi-cpp-sanitizer (builds and tests Arrow CPP with sanitizers ON with pixi-build)
  • test-pixi-python (builds and tests PyArrow with pixi-build)
  • test-pixi-python-sanitizer (builds and tests PyArrow with sanitizers ON with pixi-build)

For pixi-python-sanitizer: Use CPython and Numpy pixi.toml(s) upstream and Arrow CPP with sanitizers pixi.toml to instrument everythin.

Are these changes tested?

Yes, I've tested the changes and I created interim tests both on Python C++ side and Cython that where expected to run OOB and raise the sanitizer errors.

Are there any user-facing changes?

No

@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #49321 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions Bot added the awaiting committer review Awaiting committer review label Apr 23, 2026
@raulcd
Copy link
Copy Markdown
Member Author

raulcd commented Apr 23, 2026

@github-actions crossbow submit test-pixi-cpp

@github-actions
Copy link
Copy Markdown

Revision: a27d06c

Submitted crossbow builds: ursacomputing/crossbow @ actions-fd33b1f179

Task Status
test-pixi-cpp GitHub Actions

@raulcd
Copy link
Copy Markdown
Member Author

raulcd commented Apr 24, 2026

@github-actions crossbow submit test-pixi-*

@github-actions
Copy link
Copy Markdown

Revision: 17bb233

Submitted crossbow builds: ursacomputing/crossbow @ actions-c9de1cf29d

Task Status
test-pixi-cpp GitHub Actions
test-pixi-python GitHub Actions

@raulcd
Copy link
Copy Markdown
Member Author

raulcd commented Apr 24, 2026

@github-actions crossbow submit test-pixi-*

@github-actions
Copy link
Copy Markdown

Revision: 3f7725b

Submitted crossbow builds: ursacomputing/crossbow @ actions-3daa50a309

Task Status
test-pixi-cpp GitHub Actions
test-pixi-python GitHub Actions

Comment thread ci/pixi/default/python/pixi.toml Outdated
preview = ["pixi-build"]

[environments]
default = []
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think this is implicit)

@raulcd
Copy link
Copy Markdown
Member Author

raulcd commented Apr 28, 2026

@github-actions crossbow submit test-pixi-*

@github-actions
Copy link
Copy Markdown

Revision: fafcb99

Submitted crossbow builds: ursacomputing/crossbow @ actions-2d14fbba39

Task Status
test-pixi-cpp GitHub Actions
test-pixi-cpp-sanitizer GitHub Actions
test-pixi-python GitHub Actions

@raulcd
Copy link
Copy Markdown
Member Author

raulcd commented Apr 28, 2026

@github-actions crossbow submit test-pixi-*

@github-actions
Copy link
Copy Markdown

Revision: 4757b86

Submitted crossbow builds: ursacomputing/crossbow @ actions-d318220858

Task Status
test-pixi-cpp GitHub Actions
test-pixi-cpp-sanitizer GitHub Actions
test-pixi-python GitHub Actions

@raulcd
Copy link
Copy Markdown
Member Author

raulcd commented Apr 28, 2026

@github-actions crossbow submit test-pixi-python-sanitizer

@github-actions
Copy link
Copy Markdown

Revision: ad4a829

Submitted crossbow builds: ursacomputing/crossbow @ actions-c442e4bcac

Task Status
test-pixi-python-sanitizer GitHub Actions

@raulcd
Copy link
Copy Markdown
Member Author

raulcd commented Apr 28, 2026

@github-actions crossbow submit test-pixi-python-sanitizer

@github-actions
Copy link
Copy Markdown

Revision: 9d4dd89

Submitted crossbow builds: ursacomputing/crossbow @ actions-1fd6ac9933

Task Status
test-pixi-python-sanitizer GitHub Actions

@raulcd
Copy link
Copy Markdown
Member Author

raulcd commented Apr 28, 2026

@github-actions crossbow submit test-pixi-python-sanitizer

@github-actions
Copy link
Copy Markdown

Revision: aa462b6

Submitted crossbow builds: ursacomputing/crossbow @ actions-c76613ce90

Task Status
test-pixi-python-sanitizer GitHub Actions

@raulcd raulcd changed the title GH-49321: [C++][Python] Add pixi build for Arrow and PyArrow GH-49321: [C++][Python] Add ASAN / UBSAN pixi builds for Arrow and PyArrow Apr 28, 2026
@raulcd
Copy link
Copy Markdown
Member Author

raulcd commented Apr 28, 2026

I think it works. I was expecting more failures :)

__________________________ test_asan_sanity_oob_numpy __________________________
    def test_asan_sanity_oob_numpy():
        # TODO: ASAN sanity check (libarrow_python.so / numpy_convert.cc).
        # Same pattern as test_asan_sanity_oob. Revert before final commit.
        res = subprocess.run(
            [sys.executable, "-c",
             "import pyarrow.lib; pyarrow.lib._asan_sanity_oob_numpy()"],
            capture_output=True, text=True, timeout=30,
        )
>       assert res.returncode == 0, (
            f"ASAN caught heap-buffer-overflow in libarrow_python.\n"
            f"returncode={res.returncode}\n"
            f"stderr:\n{res.stderr}"
        )
E       AssertionError: ASAN caught heap-buffer-overflow in libarrow_python.
E         returncode=1
E         stderr:
E         =================================================================
E         ==23268==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7b5eb9a14900 at pc 0x7b3eb364e7a8 bp 0x7ffe91d6cfa0 sp 0x7ffe91d6cf98
E         READ of size 4 at 0x7b5eb9a14900 thread T0
E             #0 0x7b3eb364e7a7 in arrow::py::AsanSanityOobNumpy() /arrow/python/pyarrow/src/arrow/python/numpy_convert.cc:565
E             #1 0x7b3eb5d8f14d in __pyx_pf_7pyarrow_3lib_6_asan_sanity_oob_numpy /tmp/tmprle2nfdj/build/lib.cpp:21869
E             #2 0x7b3eb5d8f129 in __pyx_pw_7pyarrow_3lib_7_asan_sanity_oob_numpy /tmp/tmprle2nfdj/build/lib.cpp:21850
E             #3 0x7b3eb67275c5 in __Pyx_CyFunction_Vectorcall_NOARGS /tmp/tmprle2nfdj/build/lib.cpp:345932
E             #4 0x7f3ebb0c546b in _PyObject_VectorcallTstate /usr/local/src/conda/python-3.15/Include/internal/pycore_call.h:136
E             #5 0x7f3ebb0c546b in PyObject_Vectorcall /usr/local/src/conda/python-3.15/Objects/call.c:327
E             #6 0x7f3ebb376d2e in _Py_VectorCallInstrumentation_StackRefSteal /usr/local/src/conda/python-3.15/Python/ceval.c:762
E             #7 0x7f3ebb384d29 in _PyEval_EvalFrameDefault /usr/local/src/conda/python-3.15/Python/generated_cases.c.h:1817
E             #8 0x7f3ebb3b87e6 in _PyEval_EvalFrame /usr/local/src/conda/python-3.15/Include/internal/pycore_ceval.h:118
E             #9 0x7f3ebb3b87e6 in _PyEval_Vector /usr/local/src/conda/python-3.15/Python/ceval.c:2125
E             #10 0x7f3ebb3b8b58 in PyEval_EvalCode /usr/local/src/conda/python-3.15/Python/ceval.c:673
E             #11 0x7f3ebb4c1f74 in run_mod /usr/local/src/conda/python-3.15/Python/pythonrun.c:1469
E             #12 0x7f3ebb4c275b in _PyRun_StringFlagsWithName /usr/local/src/conda/python-3.15/Python/pythonrun.c:1258
E             #13 0x7f3ebb4c7348 in _PyRun_SimpleStringFlagsWithName /usr/local/src/conda/python-3.15/Python/pythonrun.c:572
E             #14 0x7f3ebb52a1fc in pymain_run_command /usr/local/src/conda/python-3.15/Modules/main.c:261
E             #15 0x7f3ebb52a1fc in pymain_run_python /usr/local/src/conda/python-3.15/Modules/main.c:682
E             #16 0x7f3ebb52a1fc in Py_RunMain /usr/local/src/conda/python-3.15/Modules/main.c:772
E             #17 0x7f3ebb52bf21 in pymain_main /usr/local/src/conda/python-3.15/Modules/main.c:802
E             #18 0x7f3ebb52c2e8 in Py_BytesMain /usr/local/src/conda/python-3.15/Modules/main.c:826
E             #19 0x7f3ebac181c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
E             #20 0x7f3ebac1828a in __libc_start_main_impl ../csu/libc-start.c:360
E             #21 0x55e6b5dce09d in _start (/arrow/ci/pixi/asan/python/.pixi/envs/test/bin/python3.15+0x109d)
E         
E         0x7b5eb9a14900 is located 396 bytes after 4-byte region [0x7b5eb9a14770,0x7b5eb9a14774)
E         allocated by thread T0 here:
E             #0 0x7f3ebbd1b6eb in malloc ../../../../libsanitizer/asan/asan_malloc_linux.cpp:67
E             #1 0x7b3eb364e71f in arrow::py::AsanSanityOobNumpy() /arrow/python/pyarrow/src/arrow/python/numpy_convert.cc:564
E             #2 0x7b3eb5d8f14d in __pyx_pf_7pyarrow_3lib_6_asan_sanity_oob_numpy /tmp/tmprle2nfdj/build/lib.cpp:21869
E             #3 0x7b3eb5d8f129 in __pyx_pw_7pyarrow_3lib_7_asan_sanity_oob_numpy /tmp/tmprle2nfdj/build/lib.cpp:21850
E             #4 0x7b3eb67275c5 in __Pyx_CyFunction_Vectorcall_NOARGS /tmp/tmprle2nfdj/build/lib.cpp:345932
E             #5 0x7f3ebb0c546b in _PyObject_VectorcallTstate /usr/local/src/conda/python-3.15/Include/internal/pycore_call.h:136
E             #6 0x7f3ebb0c546b in PyObject_Vectorcall /usr/local/src/conda/python-3.15/Objects/call.c:327
E             #7 0x7f3ebb376d2e in _Py_VectorCallInstrumentation_StackRefSteal /usr/local/src/conda/python-3.15/Python/ceval.c:762
E             #8 0x7f3ebb384d29 in _PyEval_EvalFrameDefault /usr/local/src/conda/python-3.15/Python/generated_cases.c.h:1817
E             #9 0x7f3ebb3b87e6 in _PyEval_EvalFrame /usr/local/src/conda/python-3.15/Include/internal/pycore_ceval.h:118
E             #10 0x7f3ebb3b87e6 in _PyEval_Vector /usr/local/src/conda/python-3.15/Python/ceval.c:2125
E             #11 0x7f3ebb3b8b58 in PyEval_EvalCode /usr/local/src/conda/python-3.15/Python/ceval.c:673
E             #12 0x7f3ebb4c1f74 in run_mod /usr/local/src/conda/python-3.15/Python/pythonrun.c:1469
E             #13 0x7f3ebb4c275b in _PyRun_StringFlagsWithName /usr/local/src/conda/python-3.15/Python/pythonrun.c:1258
E             #14 0x7f3ebb4c7348 in _PyRun_SimpleStringFlagsWithName /usr/local/src/conda/python-3.15/Python/pythonrun.c:572
E             #15 0x7f3ebb52a1fc in pymain_run_command /usr/local/src/conda/python-3.15/Modules/main.c:261
E             #16 0x7f3ebb52a1fc in pymain_run_python /usr/local/src/conda/python-3.15/Modules/main.c:682
E             #17 0x7f3ebb52a1fc in Py_RunMain /usr/local/src/conda/python-3.15/Modules/main.c:772
E             #18 0x7f3ebb52bf21 in pymain_main /usr/local/src/conda/python-3.15/Modules/main.c:802
E             #19 0x7f3ebb52c2e8 in Py_BytesMain /usr/local/src/conda/python-3.15/Modules/main.c:826
E             #20 0x7f3ebac181c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
E             #21 0x7f3ebac1828a in __libc_start_main_impl ../csu/libc-start.c:360
E             #22 0x55e6b5dce09d in _start (/arrow/ci/pixi/asan/python/.pixi/envs/test/bin/python3.15+0x109d)
E         
E         SUMMARY: AddressSanitizer: heap-buffer-overflow /arrow/python/pyarrow/src/arrow/python/numpy_convert.cc:565 in arrow::py::AsanSanityOobNumpy()
E         Shadow bytes around the buggy address:
E           0x7b5eb9a14680: fa fa 00 00 fa fa 00 00 fa fa 00 00 fa fa 00 00
E           0x7b5eb9a14700: fa fa 00 00 fa fa 00 00 fa fa 00 00 fa fa 04 fa
E           0x7b5eb9a14780: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
E           0x7b5eb9a14800: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
E           0x7b5eb9a14880: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
E         =>0x7b5eb9a14900:[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
E           0x7b5eb9a14980: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
E           0x7b5eb9a14a00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
E           0x7b5eb9a14a80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
E           0x7b5eb9a14b00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
E           0x7b5eb9a14b80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
E         Shadow byte legend (one shadow byte represents 8 application bytes):
E           Addressable:           00
E           Partially addressable: 01 02 03 04 05 06 07 
E           Heap left redzone:       fa
E           Freed heap region:       fd
E           Stack left redzone:      f1
E           Stack mid redzone:       f2
E           Stack right redzone:     f3
E           Stack after return:      f5
E           Stack use after scope:   f8
E           Global redzone:          f9
E           Global init order:       f6
E           Poisoned by user:        f7
E           Container overflow:      fc
E           Array cookie:            ac
E           Intra object redzone:    bb
E           ASan internal:           fe
E           Left alloca redzone:     ca
E           Right alloca redzone:    cb
E         ==23268==ABORTING
E         
E       assert 1 == 0
E        +  where 1 = CompletedProcess(args=['/arrow/ci/pixi/asan/python/.pixi/envs/test/bin/python3.15', '-c', 'import pyarrow.lib; pyarrow... bb\n  ASan internal:           fe\n  Left alloca redzone:     ca\n  Right alloca redzone:    cb\n==23268==ABORTING\n').returncode

@raulcd
Copy link
Copy Markdown
Member Author

raulcd commented Apr 30, 2026

@github-actions crossbow submit test-pixi-*

@github-actions
Copy link
Copy Markdown

Revision: 5917506

Submitted crossbow builds: ursacomputing/crossbow @ actions-c352190c2e

Task Status
test-pixi-cpp GitHub Actions
test-pixi-cpp-sanitizer GitHub Actions
test-pixi-python GitHub Actions
test-pixi-python-sanitizer GitHub Actions

@raulcd
Copy link
Copy Markdown
Member Author

raulcd commented May 4, 2026

@github-actions crossbow submit test-pixi-*

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

Revision: f142864

Submitted crossbow builds: ursacomputing/crossbow @ actions-9d972c1253

Task Status
test-pixi-cpp GitHub Actions
test-pixi-cpp-sanitizer GitHub Actions
test-pixi-python GitHub Actions
test-pixi-python-sanitizer GitHub Actions

@raulcd
Copy link
Copy Markdown
Member Author

raulcd commented May 4, 2026

@github-actions crossbow submit test-pixi-*

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

Revision: 1a9c8e7

Submitted crossbow builds: ursacomputing/crossbow @ actions-46cf962c89

Task Status
test-pixi-cpp GitHub Actions
test-pixi-cpp-sanitizer GitHub Actions
test-pixi-python GitHub Actions
test-pixi-python-sanitizer GitHub Actions

@raulcd
Copy link
Copy Markdown
Member Author

raulcd commented May 5, 2026

@github-actions crossbow submit test-pixi-*

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

Revision: bcaadfa

Submitted crossbow builds: ursacomputing/crossbow @ actions-c73780e5d4

Task Status
test-pixi-cpp GitHub Actions
test-pixi-cpp-sanitizer GitHub Actions
test-pixi-python GitHub Actions
test-pixi-python-sanitizer GitHub Actions

@raulcd raulcd marked this pull request as ready for review May 5, 2026 12:34
@lucascolley
Copy link
Copy Markdown

@raulcd nice work! How are you finding the speed for both building the instrumented packages and testing with them? If it doesn't seem slow then maybe you have fixed something on top of what I had in scipy/scipy#24066 ?

@raulcd
Copy link
Copy Markdown
Member Author

raulcd commented May 5, 2026

Hi @lucascolley ! If we compare with the existing ASAN/UBSAN job for Arrow C++, there's definitely a difference but we are using ccache on that case and using several system dependencies so I don't think the comparison is fair.
If we compare Arrow C++ build (+ tests) between pixi-builds with and without ASAN the increase is ~10 minutes (+27%).
If we compare PyArrow build (+ tests) between pixi-builds with and without ASAN the increase is ~28 minutes (+65%) but in this case we are also building CPython and Numpy from source which we are not doing on the other case.

Job Type Time
Current ASAN UBSAN Arrow C++ (+ctest) no pixi 12m 35s
New Arrow C++ (+ctest) pixi 36m 20s
New ASAN UBSAN Arrow C++ (+ctest) pixi 46m 17s
New PyArrow + Arrow C++ (+pytest) pixi 41m 37s
New ASAN UBSAN PyArrow + Arrow C++ + CPython + Numpy (+pytest) pixi 1h 8m 4s

What were the build times you were experiencing?

@lucascolley
Copy link
Copy Markdown

great thanks, I've triggered a new run over at scipy/scipy#24066 to get a new set of timings

@lucascolley
Copy link
Copy Markdown

lucascolley commented May 5, 2026

What were the build times you were experiencing?

https://github.com/scipy/scipy/actions/runs/25401648241/job/74502497187 on osx-arm64 shows ~12mins to build ASAN CPython and NumPy, ~10mins to build SciPy, and ~60mins to run the ASAN tests. The existing setup using pyenv takes a similar amount of time to build SciPy, but only ~5mins to build ASAN CPython and NumPy, and ~30mins to run the tests: https://github.com/scipy/scipy/actions/runs/25397696452/job/74488793051?pr=24956. I wonder whether extra instrumentation explains both the slower build and test times?

On Linux, I'm seeing ~16mins to build ASAN CPython and NumPy, ~20mins to build SciPy, and (still going) >60mins to run the tests: https://github.com/scipy/scipy/actions/runs/25401648241/job/74502497188?pr=24066.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants