Skip to content

Conversation

@raayandhar
Copy link

In theory this should work to fix #4343. But I can't reproduce locally yet so not sure. Working on repro-ing locally first.

@raayandhar raayandhar marked this pull request as ready for review October 15, 2025 20:03
@raayandhar
Copy link
Author

raayandhar commented Oct 15, 2025

Managed to repro locally and do some digging into what's happening - this manages to build now locally, hopefully it will work with CI as well.

@zjgarvey
Copy link
Collaborator

So I'm not sure if it is sufficient to grep for the CXX abi version in the shared object file.

E.g., if I install the nightly build we have pinned at 8/20, the torch._C._PYBIND_BUILD_ABI returns _cxxabi1018, but the grep has only up to version 11:

CXXABI_1.3.2
CXXABI_1.3.3
CXXABI_1.3.5
CXXABI_1.3.7
CXXABI_1.3.8
CXXABI_1.3.9
CXXABI_1.3.11

We might need to figure out a different approach.

@zjgarvey
Copy link
Collaborator

However, I do find _cxxabi1018 inside the libtorch_python.so. I'm not sure if this will be present in the newer release. I'll double check now.

@raayandhar
Copy link
Author

However, I do find _cxxabi1018 inside the libtorch_python.so. I'm not sure if this will be present in the newer release. I'll double check now.

If I remember correctly, I had tried looking yesterday and did not find any _cxxabi10... in the newer release (torch-stable). But worth double checking.

@zjgarvey
Copy link
Collaborator

Yeah, honestly, I have no idea what the right fix is. I've tried modifying a few things, e.g., updating pybind to 3.0.1 to match pytorch and removing explicit CXX_ABI flags. No matter what, if it compiles, then it fails tests due to an opaque error about types (likely meaning something went wrong with the abi compatibility).

I'm really not sure if it is tenable to fix this in a short amount of time, and considering this is blocking all work on this repo, I'm going to work on pulling out the e2e testing from projects/pt1 and reworking all essential dev tools to not rely on jit_ir_importer.

@raayandhar
Copy link
Author

The recent error in CI, with these failures:

  Failed Tests (9):
    TORCH_MLIR_PYTHON :: annotations-sugar.py
    TORCH_MLIR_PYTHON :: compile_api/already_scripted.py
    TORCH_MLIR_PYTHON :: compile_api/already_traced.py
    TORCH_MLIR_PYTHON :: compile_api/backend_legal_ops.py
    TORCH_MLIR_PYTHON :: compile_api/basic.py
    TORCH_MLIR_PYTHON :: compile_api/make_fx.py
    TORCH_MLIR_PYTHON :: compile_api/multiple_methods.py
    TORCH_MLIR_PYTHON :: compile_api/output_type_spec.py
    TORCH_MLIR_PYTHON :: compile_api/tracing.py
  
  
  Testing Time: 6.20s
  
  Total Discovered Tests: 17
    Passed: 8 (47.06%)
    Failed: 9 (52.94%)

I can't reproduce - I wasn't able to reproduce on stable before, then tried updating the nightly commit to something newer, and I pass all of these tests as well (when they were previously failing, just the last CI had these same errors on nightly side)...

@raayandhar
Copy link
Author

raayandhar commented Oct 17, 2025

Really not understanding how the error in CI is being caused in the Python regression tests. It's obviously related to the compiled C++ bindings, but cannot reproduce locally on stable or nightly (with the nightly version I'm using here). Might be related to cache stuff, no idea.

zjgarvey added a commit that referenced this pull request Oct 22, 2025
I think this is the simplest approach, for now, to resolve
#4343

It would be good to eventually finish
#4348 ; however, it became a bit
too much to rework the generated sources scripts in a timely fashion.

See also another parallel attempt to address the ci problems: #4345 

This PR modifies the Cmake pytorch configure function to simply not set
any `TORCH_CXX_FLAGS` whenever pytorch is missing the old
PYBIND_BUILD_ABI tag. I think whatever compiler flags we were pushing
through to make pybind think we are GCC and to use a specific ABI
version is just completely unnecessary now. I was worried we might need
to update our pybind version in the requirements, but it appears to not
be relevant.

Additionally, nightly pins are updated and small fixes are made to
resolve misc failures in tests after the bump.

---------

Signed-off-by: zjgarvey <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Build failures for new torch stable 2.9.0

2 participants