Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 27 additions & 8 deletions cpython-unix/build-cpython.sh
Original file line number Diff line number Diff line change
Expand Up @@ -683,6 +683,14 @@ fi
# This ensures we can run the binary in any location without
# LD_LIBRARY_PATH pointing to the directory containing libpython.
if [ "${PYBUILD_SHARED}" = "1" ]; then
(
shopt -s nullglob
dylibs=(${TOOLS_PATH}/deps/lib/lib*.dylib ${TOOLS_PATH}/deps/lib/lib*.so)
if [ "${#dylibs[@]}" -gt 0 ]; then
cp -av "${dylibs[@]}" ${ROOT}/out/python/install/lib/
fi
)

if [[ "${PYBUILD_PLATFORM}" = macos* ]]; then
# There's only 1 dylib produced on macOS and it has the binary suffix.
LIBPYTHON_SHARED_LIBRARY_BASENAME=libpython${PYTHON_MAJMIN_VERSION}${PYTHON_BINARY_SUFFIX}.dylib
Expand Down Expand Up @@ -821,6 +829,25 @@ if [ "${PYBUILD_SHARED}" = "1" ]; then
${ROOT}/out/python/install/lib/libpython3.so
fi
fi

# PyInstaller would like to see `ldd` work on modules.
# https://github.com/pyinstaller/pyinstaller/issues/9204#issuecomment-3171583553
# Also this probably helps programs linking libpython avoid having to set an rpath.
patchelf_args=()
if [ "${CC}" == "musl-clang" ]; then
patchelf_args+=(--set-rpath '${ORIGIN}/../..')
else
for lib in ${ROOT}/out/python/install/lib/*; do
basename=${lib##*/}
patchelf_args+=(--replace-needed "$basename" '${ORIGIN}/../../'"$basename")
done
Comment on lines +840 to +843

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually necessary? (Fedora 42 container)

$ ldd /root/.local/share/uv/python/cpython-3.13.6-linux-x86_64-gnu/lib/python3.13/lib-dynload/_tkinter.cpython-313-x86_64-linux-gnu.so
        linux-vdso.so.1 (0x00007ffcc89b4000)
        libtcl8.6.so => not found
        libtk8.6.so => not found
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007220971fa000)
        libc.so.6 => /lib64/libc.so.6 (0x0000722097008000)
        /lib64/ld-linux-x86-64.so.2 (0x0000722097213000)

$ patchelf --set-rpath '$ORIGIN/../..' /root/.local/share/uv/python/cpython-3.13.6-linux-x86_64-gnu/lib/python3.13/lib-dynload/_tkinter.cpython-313-x86_64-linux-gnu.so

$ ldd /root/.local/share/uv/python/cpython-3.13.6-linux-x86_64-gnu/lib/python3.13/lib-dynload/_tkinter.cpython-313-x86_64-linux-gnu.so
        linux-vdso.so.1 (0x00007ffd75f99000)
        libtcl8.6.so => /root/.local/share/uv/python/cpython-3.13.6-linux-x86_64-gnu/lib/python3.13/lib-dynload/../../libtcl8.6.so (0x000079bbb2be6000)
        libtk8.6.so => /root/.local/share/uv/python/cpython-3.13.6-linux-x86_64-gnu/lib/python3.13/lib-dynload/../../libtk8.6.so (0x000079bbb2930000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x000079bbb292a000)
        libc.so.6 => /lib64/libc.so.6 (0x000079bbb2738000)
        libdl.so.2 => /lib64/libdl.so.2 (0x000079bbb2732000)
        libm.so.6 => /lib64/libm.so.6 (0x000079bbb2644000)
        /lib64/ld-linux-x86-64.so.2 (0x000079bbb2db6000)

You can see that just setting the RPath was sufficient (no need to be more strict with path prefixes to deps / DT_NEEDED). Setting DT_RUNPATH provides additional search path(s) that will be used in addition to whatever ldconfig --print-cache (for glibc; musl equivalent differs) + LD_LIBRARY_PATH provide on the system (at runtime resolution).

Both of these runtime configs can also be used to satisfy PyInstaller's discovery of libraries (and PyInstaller's executable it distributes will set LD_LIBRARY_PATH for it's process accordingly at runtime to where it's bundled libs are), although the relative RPath is a better approach (and remains compatible).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference (Ubuntu 24.04 container), this is what PyTorch does as well (only adding extra search paths via RPath):

$ ldd .venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so

        linux-vdso.so.1 (0x00007fffd2b98000)
        libc10_cuda.so => /example/.venv/lib/python3.10/site-packages/torch/lib/libc10_cuda.so (0x00007a2e73bc8000)
        libcudart.so.12 => /example/.venv/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_runtime/lib/libcudart.so.12 (0x00007a2e73800000)
        libcusparse.so.12 => /example/.venv/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12 (0x00007a2e5c200000)
        libcufft.so.11 => /example/.venv/lib/python3.10/site-packages/torch/lib/../../nvidia/cufft/lib/libcufft.so.11 (0x00007a2e4ae00000)
        libcufile.so.0 => /example/.venv/lib/python3.10/site-packages/torch/lib/../../nvidia/cufile/lib/libcufile.so.0 (0x00007a2e4ab03000)
        libcusparseLt.so.0 => /example/.venv/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparselt/lib/libcusparseLt.so.0 (0x00007a2e2f993000)
        libnccl.so.2 => /example/.venv/lib/python3.10/site-packages/torch/lib/../../nvidia/nccl/lib/libnccl.so.2 (0x00007a2e16e00000)
        libcurand.so.10 => /example/.venv/lib/python3.10/site-packages/torch/lib/../../nvidia/curand/lib/libcurand.so.10 (0x00007a2e0e200000)
        libcublas.so.12 => /example/.venv/lib/python3.10/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.12 (0x00007a2e07000000)
        libcublasLt.so.12 => /example/.venv/lib/python3.10/site-packages/torch/lib/../../nvidia/cublas/lib/libcublasLt.so.12 (0x00007a2dd4c00000)
        libcudnn.so.9 => /example/.venv/lib/python3.10/site-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn.so.9 (0x00007a2dd4800000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007a2e73bb8000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007a2e73bb1000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007a2e73bac000)
        libc10.so => /example/.venv/lib/python3.10/site-packages/torch/lib/libc10.so (0x00007a2e736ed000)
        libtorch_cpu.so => /example/.venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so (0x00007a2dc0348000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007a2dc00ca000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007a2e73ac1000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007a2e736bf000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007a2dbfeb8000)
        /lib64/ld-linux-x86-64.so.2 (0x00007a2eaf861000)
        libnvJitLink.so.12 => /example/.venv/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/../../nvjitlink/lib/libnvJitLink.so.12 (0x00007a2dba3d0000)
        libgomp.so.1 => /example/.venv/lib/python3.10/site-packages/torch/lib/libgomp.so.1 (0x00007a2dba000000)
        libcupti.so.12 => /example/.venv/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_cupti/lib/libcupti.so.12 (0x00007a2db9892000)
        libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007a2e73aba000)
$ patchelf --print-needed .venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so

libc10_cuda.so
libcudart.so.12
libcusparse.so.12
libcufft.so.11
libcufile.so.0
libcusparseLt.so.0
libnccl.so.2
libcurand.so.10
libcublas.so.12
libcublasLt.so.12
libcudnn.so.9
librt.so.1
libdl.so.2
libpthread.so.0
libc10.so
libtorch_cpu.so
libstdc++.so.6
libm.so.6
libgcc_s.so.1
libc.so.6
ld-linux-x86-64.so.2
$ patchelf --print-rpath .venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so

$ORIGIN/../../nvidia/cublas/lib:$ORIGIN/../../nvidia/cuda_cupti/lib:$ORIGIN/../../nvidia/cuda_nvrtc/lib:$ORIGIN/../../nvidia/cuda_runtime/lib:$ORIGIN/../../nvidia/cudnn/lib:$ORIGIN/../../nvidia/cufft/lib:$ORIGIN/../../nvidia/curand/lib:$ORIGIN/../../nvidia/cusolver/lib:$ORIGIN/../../nvidia/cusparse/lib:$ORIGIN/../../nvidia/cusparselt/lib:$ORIGIN/../../cusparselt/lib:$ORIGIN/../../nvidia/nccl/lib:$ORIGIN/../../nvidia/nvtx/lib:$ORIGIN/../../nvidia/cufile/lib:$ORIGIN

fi
# At the moment, python3 and libpython don't have shared-library
# dependencies, but at some point we will want to run this for
# them too.
for module in ${ROOT}/out/python/install/lib/python*/lib-dynload/*.so; do
patchelf "${patchelf_args[@]}" "$module"
done
fi
fi

Expand Down Expand Up @@ -1247,14 +1274,6 @@ if [ -d "${TOOLS_PATH}/deps/lib/tcl8" ]; then
for source in ${TOOLS_PATH}/deps/lib/{itcl4.2.4,tcl8,tcl8.6,thread2.8.9,tk8.6}; do
cp -av $source ${ROOT}/out/python/install/lib/
done

(
shopt -s nullglob
dylibs=(${TOOLS_PATH}/deps/lib/lib*.dylib ${TOOLS_PATH}/deps/lib/lib*.so)
if [ "${#dylibs[@]}" -gt 0 ]; then
cp -av "${dylibs[@]}" ${ROOT}/out/python/install/lib/
fi
)
fi

# Copy the terminfo database if present.
Expand Down
125 changes: 106 additions & 19 deletions src/validation.rs
Original file line number Diff line number Diff line change
Expand Up @@ -265,24 +265,32 @@ static ELF_ALLOWED_LIBRARIES_BY_TRIPLE: Lazy<HashMap<&'static str, Vec<&'static
.collect()
});

static ELF_ALLOWED_LIBRARIES_BY_MODULE: Lazy<HashMap<&'static str, Vec<&'static str>>> =
Lazy::new(|| {
[
(
// libcrypt is provided by the system, but only on older distros.
"_crypt",
vec!["libcrypt.so.1"],
),
(
// libtcl and libtk are shipped in our distribution.
"_tkinter",
vec!["libtcl8.6.so", "libtk8.6.so"],
),
]
.iter()
.cloned()
.collect()
});
#[derive(Copy, Clone, PartialEq)]
enum DepSource {
SystemRequired,
SystemOptional,
Vendored,
}
use DepSource::*;

static ELF_ALLOWED_LIBRARIES_BY_MODULE: Lazy<
HashMap<&'static str, Vec<(&'static str, DepSource)>>,
> = Lazy::new(|| {
[
(
// libcrypt is provided by the system, but only on older distros.
"_crypt",
vec![("libcrypt.so.1", SystemOptional)],
),
(
"_tkinter",
vec![("libtcl8.6.so", Vendored), ("libtk8.6.so", Vendored)],
),
]
.iter()
.cloned()
.collect()
});

static DARWIN_ALLOWED_DYLIBS: Lazy<Vec<MachOAllowedDylib>> = Lazy::new(|| {
[
Expand Down Expand Up @@ -1022,7 +1030,7 @@ fn validate_elf<Elf: FileHeader<Endian = Endianness>>(
if let Some(filename) = path.file_name() {
if let Some((module, _)) = filename.to_string_lossy().split_once(".cpython-") {
if let Some(extra) = ELF_ALLOWED_LIBRARIES_BY_MODULE.get(module) {
allowed_libraries.extend(extra.iter().map(|x| x.to_string()));
allowed_libraries.extend(extra.iter().map(|x| x.0.to_string()));
}
}
}
Expand Down Expand Up @@ -2186,6 +2194,85 @@ fn verify_distribution_behavior(dist_path: &Path) -> Result<Vec<String>> {
errors.push("errors running interpreter tests".to_string());
}

// Explicitly test ldd directly on the extension modules, which PyInstaller
// relies on. This is not strictly needed for a working distribution (e.g.
// you can set an rpath on just python+libpython), so we test here for
// compatibility with tools that run ldd.
// that fails this check (e.g. by setting an rpath on just python+libpython).
// https://github.com/pyinstaller/pyinstaller/issues/9204#issuecomment-3171050891
// TODO(geofft): musl doesn't do lazy binding for the argument to
// ldd, so we will get complaints about missing Py_* symbols. Need
// to handle this somehow, skip testing for now.
Comment on lines +2203 to +2205

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just filter out those lines?:

$ /lib/ld-musl-x86_64.so.1 --list /root/.local/share/uv/python/cpython-3.13.9-linux-x86_64-musl/lib/python3.13/lib-dynload/_tkinter.cpython-313-x86_64-linux-musl.so 2>&1 | grep -v relocating

        /lib/ld-musl-x86_64.so.1 (0x7453c8bed000)
Error loading shared library libtcl8.6.so: No such file or directory (needed by /root/.local/share/uv/python/cpython-3.13.9-linux-x86_64-musl/lib/python3.13/lib-dynload/_tkinter.cpython-313-x86_64-linux-musl.so)
Error loading shared library libtk8.6.so: No such file or directory (needed by /root/.local/share/uv/python/cpython-3.13.9-linux-x86_64-musl/lib/python3.13/lib-dynload/_tkinter.cpython-313-x86_64-linux-musl.so)
        libc.so => /lib/ld-musl-x86_64.so.1 (0x7453c8bed000)
$ patchelf --print-needed /root/.local/share/uv/python/cpython-3.13.9-linux-x86_64-musl/lib/python3.13/lib-dynload/_tkinter.cpython-313-x86_64-linux-musl.so

libtcl8.6.so
libtk8.6.so
libc.so

if cfg!(target_os = "linux") && !python_json.target_triple.contains("-musl") {
// musl's ldd is packaged in the "musl-tools" Debian package.
let ldd = if python_json.target_triple.contains("-musl") && cfg!(not(target_env = "musl")) {
"musl-ldd"
} else {
"ldd"
};
for (name, variants) in python_json.build_info.extensions.iter() {
for ext in variants {
let Some(shared_lib) = &ext.shared_lib else {
continue;
};
let shared_lib_path = temp_dir.path().join("python").join(shared_lib);
let output = duct::cmd(ldd, [shared_lib_path])
.unchecked()
.stdout_capture()
.run()
.context(format!("Failed to run `{ldd} {shared_lib}`"))?;
let stdout = String::from_utf8_lossy(&output.stdout);
// Format of ldd output, for both glibc and musl:
// - Everything starts with a tab.
// - Most things are "libxyz.so.1 => /usr/lib/libxyz.so.1 (0xabcde000)".
// - The ELF interpreter is displayed as just "/lib/ld.so (0xabcde000)".
// - glibc, but not musl, shows the vDSO as "linux-vdso.so.1 (0xfffff000)".
Comment on lines +2227 to +2229

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true regarding glibc vs musl for vDSO, just look at the output I showed from Ubuntu and Fedora containers with ldd, that's glibc and still includes linux-vdso.so.1 (these are transitive from glibc).

Realistically, you should only be interested in direct dependencies? As per patchelf --print-needed output (which doesn't add tabbed indents). That will show what's actually encoded as DT_NEEDED entries.

Anything transitive is resolved at runtime, thus ldd can be useful for troubleshooting that dependency chain at a runtime environment and filtering for not found if anything failed to resolve. Likewise, you could validate the libraries were resolved to the paths you expect.

// - If a library is listed in DT_NEEDED with an absolute path, or (currently only
// supported on glibc) with an $ORIGIN-relative path, it displays as just
// "/path/to/libxyz.so (0xabcde000)".
// - On glibc, if a library cannot be found ldd returns zero and shows "=> not
// found" as the resolution (even if it wouldn't use the => form if found).
// - On musl, if a library cannot be found, ldd returns nonzero and shows "Error
// loading shared library ...:" on stderr.
Comment on lines +2233 to +2236

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In glibc you'll often find ldd is a bash script (varies by distro) around the actual runtime dynamic linker, which you could call with --list to resolve libraries:

# `/usr/bin/ldd` is a bash script wrapper around this:
$ /lib64/ld-linux-x86-64.so.2 --list /lib64/libc.so.6
        /lib64/ld-linux-x86-64.so.2 (0x00007f8ead016000)
        linux-vdso.so.1 (0x00007fff355f6000)

# Failure example:
$ /lib64/ld-linux-x86-64.so.2 --list /root/.local/share/uv/python/cpython-3.13.6-linux-x86_64-gnu/lib/python3.13/lib-dynload/_tkinter.cpython-313-x86_64-linux-gnu.so

/root/.local/share/uv/python/cpython-3.13.6-linux-x86_64-gnu/lib/python3.13/lib-dynload/_tkinter.cpython-313-x86_64-linux-gnu.so: error while loading shared libraries: libtcl8.6.so: cannot open shared object file: No such file or directory

As you can see the error message differs from the not found that the ldd script command outputs.

Alpine has a very simple one for musl:

$ docker run --rm -it alpine
$ cat /usr/bin/ldd
#!/bin/sh
exec /lib/ld-musl-x86_64.so.1 --list "$@"

For testing that validates direct deps of a library/executable are found, you should be able to use patchelf --print-needed to get a list of deps, then check those against patchelf --print-rpath paths, followed by the systems standard search paths (a match from ldconfig --print-cache output should work? Just don't use that from Alpine). That should be mostly glibc/musl agnostic then? (With exceptions like libc.so => /lib/ld-musl-x86_64.so.1)

if !output.status.success() {
// TODO: If we ever have any optional dependencies besides libcrypt (which is
// glibc-only), we will need to capture musl ldd's stderr and parse it.
errors.push(format!(
"`{ldd} {shared_lib}` exited with {}:\n{stdout}",
output.status
));
Comment on lines +2237 to +2243

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps instead of relying on ldd CLI you could use an equivalent crate like elb-dl? It has support for resolving ELF dependencies that use glibc and musl dynamic loaders: https://crates.io/crates/elb-dl

Might be more consistent via a crate library to implement this logic for, without the caveats of handling different program outputs?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a related crate for patchelf functionality too: https://docs.rs/elb

} else {
let mut ldd_errors = vec![];
let deps = ELF_ALLOWED_LIBRARIES_BY_MODULE.get(&name[..]);
let temp_dir_lossy = temp_dir.path().to_string_lossy().into_owned();
for line in stdout.lines() {
let Some((needed, resolution)) = line.trim().split_once(" => ") else {
continue;
};
let dep_source = deps
.and_then(|deps| {
deps.iter().find(|dep| dep.0 == needed).map(|dep| dep.1)
})
.unwrap_or(SystemRequired);
if resolution.starts_with("not found") && dep_source != SystemOptional {
ldd_errors.push(format!("{needed} was expected to be found"));
} else if !resolution.contains(&temp_dir_lossy) && dep_source == Vendored {
ldd_errors.push(format!(
"{needed} should not come from the OS (missing rpath/$ORIGIN?)"
));
}
}
if !ldd_errors.is_empty() {
errors.push(format!(
"In `{ldd} {shared_lib}`:\n - {}\n{stdout}",
ldd_errors.join("\n - ")
));
}
}
}
}
}

Ok(errors)
}

Expand Down
Loading