[syscall] E: Honour STB_WEAK undefined symbols#22
Open
esaurez wants to merge 1 commit into
Open
Conversation
Per the System V ABI generic ABI (gABI, chapter "Symbol Table"), an
undefined symbol whose binding is `STB_WEAK` and which cannot be
resolved at dynamic-link time is silently taken to have the value zero
(or `NULL` for function symbols). Every mainstream ELF dynamic
loader -- glibc `elf/dl-lookup.c`, musl `ldso/dynlink.c`, FreeBSD
`libexec/rtld-elf/rtld.c`, Android Bionic
`linker/linker_relocate.cpp` -- implements this contract; the program
is responsible for null-checking the symbol before use.
The Nanvix in-process loader's `get_symbol_value()` previously
returned `Err(BadFile, "symbol not found")` unconditionally when a
referenced symbol could not be resolved, regardless of binding. As a
result, any `.so` containing a weak undefined reference -- which is
extremely common: libstdc++ keeps weak refs to `pthread_*` and
`__gmon_start__`, glibc-compatible compilers emit weak refs to
`__cxa_thread_atexit_impl` and TLS descriptors, and Rust crates often
expose optional integration hooks as weak symbols -- failed to
`dlopen()`, even though the program would have run correctly with the
spec-defined zero substitution.
This patch fills that gap:
- `src/libs/elf/src/elf32.rs` adds the `STB_LOCAL` / `STB_GLOBAL` /
`STB_WEAK` constants, the `ST_BIND_SHIFT` constant, and an
`Elf32Sym::st_bind()` accessor that extracts the binding nibble.
- `src/libs/elf/src/relocation.rs` lifts the raw `STB_*` values into
a typed `SymbolBinding` enum (`Local` / `Global` / `Weak` /
`Other`), wires `Symbol::binding()` through the goblin
`st_bind()` helper, and adds an `is_weak()` predicate. Unknown
or reserved binding values fall through to `SymbolBinding::Other`
so they are never silently treated as weak. Five unit tests
cover decoding, the `Other` fallback, the independence of the
binding nibble from the type nibble, and that `is_weak()` /
`is_undefined()` are orthogonal axes.
- `src/libs/syscall/src/dlfcn/syscall/dynlib.rs::get_symbol_value()`
intercepts the existing `lookup() == None` arm: if the referring
symbol is both `SHN_UNDEF` and `STB_WEAK`, return `Ok(0)` and log
at `debug!`. Strong undefined references continue to return the
pre-existing `BadFile` error, so `dlopen(RTLD_NOW)` still fails
on a genuine missing strong dependency.
- `src/libs/syscall/src/dlfcn/syscall/dynlib.rs::query()` skips
undefined entries before consulting `get_symbol_value()`. Without
this, `dladdr()` would report a ghost symbol at address `0` for
every weak undefined entry in the dynsym; that is correct
behaviour for relocation but is not what `dladdr` is meant to
surface.
Substituting zero is safe across the relocation types the loader
currently implements (`R_386_32`, `R_386_PC32`, `R_386_JMP_SLOT`,
`R_386_GLOB_DAT`): the resulting GOT / PLT entry or in-place 32-bit
slot is null, so any code path that actually dereferences the symbol
traps deterministically -- matching the contract the spec puts on the
program (it must null-check before use).
Validated end-to-end on the Nanvix microvm:
- The new `dlfcn-weak-c` regression suite (proposed in
nanvix/posix-tests, see esaurez/posix-tests#1) exercises all four
weak-undefined relocation classes (GOT/PLT × function/data) in
both resolved-via-main-exe and missing variants, plus a strong-
undefined regression guard. All 7 cases pass against this
branch. The strong-undefined case still fails `dlopen(RTLD_NOW)`,
preserving the existing contract.
- CPython 3.12 successfully links against libstdc++ with weak refs
to `pthread_*` etc. left unresolved at .so-load time, runs
`hello.py`, then `dlopen()`s and exercises numpy 1.26.4 to
produce `NUMPY_TEST_OK` on the guest.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This was referenced May 29, 2026
ppenna
pushed a commit
to nanvix/cpython
that referenced
this pull request
Jun 8, 2026
Updates `Makefile.nanvix` so that `python.elf` correctly serves as the "main module" against which extension `.so`s (numpy, ssl, lxml, future pip-installed wheels, ...) resolve their C and C++ runtime symbols at dlopen() time. This is the consumer-side companion to the Nanvix loader's STB_WEAK support (esaurez/nanvix#22) and is gated on the new libposix `pathconf` / `fpathconf` stubs (esaurez/nanvix#23) for the configure conftest to even produce an executable. Three coordinated link-flag changes to the `CONFIGURE_ENV` block: 1. `LIBS` segment 1 -- new `--whole-archive ... --no-whole-archive` block ahead of the existing `--start-group`. Forces every object from libposix, libc, libm, libstdc++, and libgcc into python.elf so the runtime symbols extension `.so`s depend on are embedded (and re-exported via `-Wl,--export-dynamic`, already present). Without this, the static linker drops unreferenced objects (e.g. `fscanf`, `longjmp`, `strtold_l` for numpy; `operator new/delete[]`, `__cxa_*`, `_Unwind_*`, `std::type_info` vtables for any C++ extension) and subsequent dlopen() of those `.so`s fails with "symbol not found". 2. `LIBS` segment 2 -- the existing `--start-group` is trimmed to just the external add-on libraries (sqlite3, ssl, crypto, z, bz2, lzma, ffi). It no longer re-lists libposix / libc / libm: those archives are already fully included by segment 1, so the external libs can resolve their references against the already-embedded objects. 3. Two new top-level Makefile vars `LIBSTDCXX := -lstdc++` and `LIBGCC := -lgcc`. The GCC driver resolves them against its built- in search paths (libgcc lives under a versioned `lib/gcc/i686- nanvix/<gcc-version>/` directory, which would be fragile to hardcode). Defined once at top level because the `-l` form is identical between the docker and host build paths. `LDFLAGS` is unchanged. The existing `-Wl,--allow-multiple-definition` flag is kept and the surrounding comment is expanded to honestly enumerate the duplicate-symbol categories the flag is masking (newlib long-double math helpers, libposix/libc env+isatty overlaps, libc/libm math helper overlaps, libgcc internal `__x86.get_pc_thunk.*` duplicates, etc.) -- the set is large and toolchain-build-version- dependent, and is the only practical workaround until the contributing upstreams are fixed. `.nanvix/config.py::configure_env()` -- an unused helper that mirrors `Makefile.nanvix`'s `CONFIGURE_ENV` -- is kept in sync (same `--whole-archive` LIBS, same LDFLAGS) and gains a docstring calling out the dead-code status. A separate small cleanup PR can delete the helper entirely. Validated end-to-end on the Nanvix microvm: CPython 3.12 + numpy 1.26.4 runs `import numpy`, `np.arange`, `np.dot`, `reshape`, `flatten`, broadcasting, all producing `NUMPY_TEST_OK`. Hello.py and the existing single-process / multi-process / standalone modes are unaffected by the change because the linker flags are not mode-conditional. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ppenna
pushed a commit
to nanvix/cpython
that referenced
this pull request
Jun 10, 2026
Updates `Makefile.nanvix` so that `python.elf` correctly serves as the "main module" against which extension `.so`s (numpy, ssl, lxml, future pip-installed wheels, ...) resolve their C and C++ runtime symbols at dlopen() time. This is the consumer-side companion to the Nanvix loader's STB_WEAK support (esaurez/nanvix#22) and is gated on the new libposix `pathconf` / `fpathconf` stubs (esaurez/nanvix#23) for the configure conftest to even produce an executable. Three coordinated link-flag changes to the `CONFIGURE_ENV` block: 1. `LIBS` segment 1 -- new `--whole-archive ... --no-whole-archive` block ahead of the existing `--start-group`. Forces every object from libposix, libc, libm, libstdc++, and libgcc into python.elf so the runtime symbols extension `.so`s depend on are embedded (and re-exported via `-Wl,--export-dynamic`, already present). Without this, the static linker drops unreferenced objects (e.g. `fscanf`, `longjmp`, `strtold_l` for numpy; `operator new/delete[]`, `__cxa_*`, `_Unwind_*`, `std::type_info` vtables for any C++ extension) and subsequent dlopen() of those `.so`s fails with "symbol not found". 2. `LIBS` segment 2 -- the existing `--start-group` is trimmed to just the external add-on libraries (sqlite3, ssl, crypto, z, bz2, lzma, ffi). It no longer re-lists libposix / libc / libm: those archives are already fully included by segment 1, so the external libs can resolve their references against the already-embedded objects. 3. Two new top-level Makefile vars `LIBSTDCXX := -lstdc++` and `LIBGCC := -lgcc`. The GCC driver resolves them against its built- in search paths (libgcc lives under a versioned `lib/gcc/i686- nanvix/<gcc-version>/` directory, which would be fragile to hardcode). Defined once at top level because the `-l` form is identical between the docker and host build paths. `LDFLAGS` is unchanged. The existing `-Wl,--allow-multiple-definition` flag is kept and the surrounding comment is expanded to honestly enumerate the duplicate-symbol categories the flag is masking (newlib long-double math helpers, libposix/libc env+isatty overlaps, libc/libm math helper overlaps, libgcc internal `__x86.get_pc_thunk.*` duplicates, etc.) -- the set is large and toolchain-build-version- dependent, and is the only practical workaround until the contributing upstreams are fixed. `.nanvix/config.py::configure_env()` -- an unused helper that mirrors `Makefile.nanvix`'s `CONFIGURE_ENV` -- is kept in sync (same `--whole-archive` LIBS, same LDFLAGS) and gains a docstring calling out the dead-code status. A separate small cleanup PR can delete the helper entirely. Validated end-to-end on the Nanvix microvm: CPython 3.12 + numpy 1.26.4 runs `import numpy`, `np.arange`, `np.dot`, `reshape`, `flatten`, broadcasting, all producing `NUMPY_TEST_OK`. Hello.py and the existing single-process / multi-process / standalone modes are unaffected by the change because the linker flags are not mode-conditional. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Teaches the in-process Nanvix dynamic loader to honour the System V ABI's STB_WEAK contract: an undefined weak symbol that cannot be resolved at dynamic-link time resolves to value
0, not to a loader error. This matches every mainstream ELF loader and is a prerequisite for.sofiles produced by GCC/Clang — the C++ runtime, TLS bootstrap, and many GNU extensions all carry weak refs that may or may not be present at load time.Why this is required
The current loader rejects any
.socontaining any undefined weak symbol that no loaded module defines. In practice that breaks:pthread_*(so a static link to a single-threaded program still works),__gmon_start__, and several_ITM_*items. When the consumer of a libstdc++.sois itself a static binary, none of those are defined anywhere — the spec expects them to fold to 0; we expected them to be resolvable.__cxa_thread_atexit_impland friends so that the C++ runtime can optionally hook into glibc's per-thread cleanup machinery. On Nanvix there is no glibc, so these stay undefined; the program is designed to fall through, but only if the loader complies with the ABI..so(and most pip-installed wheels) ship weak refs to glibc-isms via libstdc++. These prevented us from running anything beyond pure-Python code on Nanvix and forced the toolchain wrapper to pass-Wl,--allow-shlib-undefinedas a workaround, which papered over weak and strong undefined refs alike — defeating the linker's diagnostic value.After this patch,
.sos that exercise any of the above just work, and the toolchain wrapper can stop hiding link-time errors.What changed
src/libs/elf/src/elf32.rsSTB_LOCAL/STB_GLOBAL/STB_WEAKconstants,ST_BIND_SHIFT, andElf32Sym::st_bind()accessor.src/libs/elf/src/relocation.rsSymbolBindingenum,Symbol::binding(),Symbol::is_weak(). Unknown bindings fall through toOtherso they are never silently treated as weak. Five unit tests cover the helper.src/libs/syscall/src/dlfcn/syscall/dynlib.rsget_symbol_value(), whenlookup() == None, returnOk(0)if the referring symbol isSHN_UNDEFandSTB_WEAK. Strong undefined refs still error. (2) Inquery(), skipSHN_UNDEFentries before consultingget_symbol_value()sodladdr()does not report ghost symbols at address0.Substituting zero is safe across the relocation types the loader currently implements (
R_386_32,R_386_PC32,R_386_JMP_SLOT,R_386_GLOB_DAT): the resulting GOT / PLT entry or in-place 32-bit slot is null, so any code path that actually dereferences the symbol traps deterministically — matching the contract the spec puts on the program (it must null-check before use).Specification
System V ABI ("gABI"), chapter Symbol Table:
Precedent in mainstream loaders
Every well-known ELF dynamic loader implements this rule. The contract is uniform; only the surrounding code style varies.
elf/dl-lookup.c,_dl_lookup_symbol_x): after iterating every scope, ifcurrent_value.s == NULL, glibc inspects the reference symbol's binding. If it isSTB_WEAK, glibc fills the lookup result with a synthesizedsym_val { 0, 0 }and proceeds; otherwise it raises an unresolved-symbol error. The i386 machine-specific reloc handlersysdeps/i386/dl-machine.hwrites the resulting zero throughR_386_GLOB_DAT/R_386_JMP_SLOTlike any other relocation.ldso/dynlink.c,find_sym2and the relocation loop indo_relocs):find_sym2returns a sentinel with.sym = 0on lookup failure; the relocation loop then checksELF_ST_BIND(sym->st_info) == STB_WEAKand skips the error path entirely when the binding is weak.rtld-elf(libexec/rtld-elf/rtld.c):find_symdef()returns asym_zeroplaceholder for unresolved weak references, populated at rtld startup as a zero-valued symbol. All architecture-specific reloc handlers (reloc.c) treat that placeholder as a normal zero-valued definition.linker/linker_relocate.cpp):lookup_symbol()returns a nullsymfor unresolved refs, and the relocation loop branches onis_weak. Weak unresolved refs proceed withsym_addr = 0; strong unresolved refs report an error viaDL_ERR.The behaviour we now implement is the intersection of all four: unresolved weak undef → value 0; unresolved strong undef → loader error. Nothing more, nothing less.
Validation
In-tree unit tests in
src/libs/elf/src/relocation.rs::testscover binding decoding, theOtherfallback for unknown bindings, the independence of the binding nibble from the type nibble, and thatis_weak()andis_undefined()are orthogonal predicates.dlfcn-weak-cregression suite (proposed at [dlfcn] E: Add dlfcn-weak-c tests for STB_WEAK loader semantics posix-tests#1) drives all the observable contract through the publicdlfcnAPI:libstrong-missing.soR_386_JUMP_SLOTstrong_missingdlopen(RTLD_NOW)failslibweak-func-resolved.soR_386_GLOB_DATmain_callbacklibweak-func-missing.soR_386_GLOB_DATmissing_callback&fn == NULLlibweak-data-resolved.soR_386_GLOB_DATweak_datalibweak-data-missing.soR_386_GLOB_DATmissing_weak_data&data == NULLlibweak-plt-resolved.soR_386_JUMP_SLOTmain_callbacklibweak-plt-missing.soR_386_JUMP_SLOTmissing_plt_callbackdlopen(RTLD_NOW)succeeds (no call)Case 5 — the strong-undefined regression guard — runs first and passes both before and after this patch, confirming that strong undef behaviour is unchanged. Cases 1-4 and 6-7 only pass with this patch applied. All 7 pass end-to-end on the Nanvix microvm against this branch.
CPython 3.12 + numpy 1.26.4 end-to-end: cpython links against libstdc++ (weak
pthread_*etc. left unresolved at .so-load time), runshello.py, thenimport numpy; numpy.array(...).sum()viadlopen()of the numpy.sofamily; the test harness printsNUMPY_TEST_OK.Compatibility
SymbolBindingis new but does not displace any existing helper.dladdr()behaviour improves (no more ghost zero-address symbols from undefined dynsym entries).