-
Notifications
You must be signed in to change notification settings - Fork 607
fix: grab PyTorch ABI version from libtorch_python.so #4345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: raayandhar <[email protected]>
Signed-off-by: raayandhar <[email protected]>
|
Managed to repro locally and do some digging into what's happening - this manages to build now locally, hopefully it will work with CI as well. |
|
So I'm not sure if it is sufficient to grep for the CXX abi version in the shared object file. E.g., if I install the nightly build we have pinned at 8/20, the We might need to figure out a different approach. |
|
However, I do find |
If I remember correctly, I had tried looking yesterday and did not find any |
|
Yeah, honestly, I have no idea what the right fix is. I've tried modifying a few things, e.g., updating pybind to 3.0.1 to match pytorch and removing explicit CXX_ABI flags. No matter what, if it compiles, then it fails tests due to an opaque error about types (likely meaning something went wrong with the abi compatibility). I'm really not sure if it is tenable to fix this in a short amount of time, and considering this is blocking all work on this repo, I'm going to work on pulling out the e2e testing from |
Signed-off-by: raayandhar <[email protected]>
|
The recent error in CI, with these failures: I can't reproduce - I wasn't able to reproduce on stable before, then tried updating the nightly commit to something newer, and I pass all of these tests as well (when they were previously failing, just the last CI had these same errors on nightly side)... |
|
Really not understanding how the error in CI is being caused in the Python regression tests. It's obviously related to the compiled C++ bindings, but cannot reproduce locally on stable or nightly (with the nightly version I'm using here). Might be related to cache stuff, no idea. |
I think this is the simplest approach, for now, to resolve #4343 It would be good to eventually finish #4348 ; however, it became a bit too much to rework the generated sources scripts in a timely fashion. See also another parallel attempt to address the ci problems: #4345 This PR modifies the Cmake pytorch configure function to simply not set any `TORCH_CXX_FLAGS` whenever pytorch is missing the old PYBIND_BUILD_ABI tag. I think whatever compiler flags we were pushing through to make pybind think we are GCC and to use a specific ABI version is just completely unnecessary now. I was worried we might need to update our pybind version in the requirements, but it appears to not be relevant. Additionally, nightly pins are updated and small fixes are made to resolve misc failures in tests after the bump. --------- Signed-off-by: zjgarvey <[email protected]>
In theory this should work to fix #4343. But I can't reproduce locally yet so not sure. Working on repro-ing locally first.