Skip to content

[build] E: Build libxslt.so and libexslt.so#1

Closed
esaurez wants to merge 1 commit into
nanvix/v1.1.42from
feat/build-shared-library
Closed

[build] E: Build libxslt.so and libexslt.so#1
esaurez wants to merge 1 commit into
nanvix/v1.1.42from
feat/build-shared-library

Conversation

@esaurez

@esaurez esaurez commented Jun 4, 2026

Copy link
Copy Markdown
Owner

Summary

Produce position-independent libxslt.so and libexslt.so alongside the existing static .a archives, wired as a real DT_NEEDED chain on top of esaurez/libxml2#1's libxml2.so:

libxslt.so   -> NEEDED libxml2.so
libexslt.so  -> NEEDED libxslt.so, NEEDED libxml2.so

Only each .so's own .a is embedded via --whole-archive; the lower layers (libxml2, libz) are not bundled. The Nanvix dynamic loader pulls them in transitively at dlopen time via esaurez/nanvix#27.

Size impact (vs the discarded self-contained prototype)

Artifact Self-contained DT_NEEDED chain
libxslt.so 1.8 MB 296 KB
libexslt.so 1.9 MB 92 KB

~3.3 MB reclaimed in libxslt alone; the chain also eliminates the 6 MB of libxml2 duplication across the full stack (libxml2 → libxslt → libexslt → liblxml_etree).

Implementation

Change Detail
--with-pic, -fPIC in CFLAGS Same .o files reusable for .a and .so
Keep --disable-shared libtool has no rules for i686-nanvix; the .so files are linked manually
New $(SHAREDLIB_XSLT) / $(SHAREDLIB_EXSLT) targets gcc -shared -fPIC -nostdlib -Wl,--whole-archive <own>.a -Wl,--no-whole-archive -lxml2 [-lxslt], setting DT_SONAME=libxslt.so / DT_SONAME=libexslt.so
make test extended Verifies presence, DT_SONAME, and that the public API entry points appear in .dynsym
.nanvix/z.py _BUILD_OUTPUTS and release() Ship both static and shared variants

Validation

$ readelf -d libxslt/.libs/libxslt.so
 0x00000001 (NEEDED)  Shared library: [libxml2.so]
 0x0000000e (SONAME)  Library soname: [libxslt.so]

$ readelf -d libexslt/.libs/libexslt.so
 0x00000001 (NEEDED)  Shared library: [libxslt.so]
 0x00000001 (NEEDED)  Shared library: [libxml2.so]
 0x0000000e (SONAME)  Library soname: [libexslt.so]

The DT_NEEDED chain is correctly emitted; end-to-end resolution at runtime is validated downstream in esaurez/lxml#1 and the follow-up CPython integration.

Sequenced rollout

This PR's CI build needs libxml2.so present in the buildroot to emit the DT_NEEDED libxml2.so entry. That requires:

  1. Merge esaurez/libxml2#1.
  2. Cut a new esaurez/libxml2 release (the buildroot is populated from release tarballs).
  3. Bump this repo's nanvix.toml pin to that release.
  4. Then merge this PR.

Until step 3, CI satisfies -lxml2 against the existing libxml2.a in the upstream release tarball, which produces a libxslt.so without a DT_NEEDED libxml2.so entry — the build succeeds, just doesn't yet realize the chain benefit. The end-state expects libxml2.so to be present.

Runtime dependencies

  • esaurez/nanvix#27.init_array invocation and DT_NEEDED chain walking in the user-space loader.
  • esaurez/nanvix#28 — diamond DT_NEEDED handling. libexslt.so depends on both libxslt.so and libxml2.so, which itself forms a diamond once consumers (e.g. liblxml_etree.so) layer on top.
  • esaurez/libxml2#1 — libxml2.so ships in its release.

Companion PRs

Consumer

  • esaurez/cpython#11 — Phase 4 of the CPython .a → .so migration consumes this chain at runtime via dlopen of liblxml_etree.so.

Produce position-independent libxslt.so and libexslt.so alongside
the existing static .a archives, wired as a real DT_NEEDED chain
on top of esaurez/libxml2's libxml2.so:

  libxslt.so   -> NEEDED libxml2.so
  libexslt.so  -> NEEDED libxslt.so, NEEDED libxml2.so

Only each .so's own .a is embedded via --whole-archive; the lower
layers (libxml2, libz) are NOT bundled, so the Nanvix dynamic
loader pulls them in transitively at dlopen time. This eliminates
the multi-megabyte per-module duplication a self-contained build
would cause and exercises the DT_NEEDED chain support shipped in
esaurez/nanvix#27 in a real-world setting.

Concretely:

* `--with-pic`, `-fPIC` in CFLAGS — same .o files reusable for .a
  and .so.
* Keep `--disable-shared` (libtool has no rules for i686-nanvix);
  the .so files are linked manually with `-shared -fPIC -nostdlib`.
* The new SHAREDLIB targets use `-Wl,--whole-archive <own>.a
  -Wl,--no-whole-archive -lxml2 [-lxslt]`, setting
  DT_SONAME=libxslt.so / DT_SONAME=libexslt.so.
* `make test` extended to verify each .so has the expected SONAME
  and exports its public API entry point.
* `.nanvix/z.py` `_BUILD_OUTPUTS` and `release()` ship both static
  and shared variants.

Sizes (stripped, DT_NEEDED chain vs the discarded self-contained
prototype):

  libxslt.so   296 KB (was 1.8 MB)
  libexslt.so   92 KB (was 1.9 MB)

Runtime dependencies:

* esaurez/nanvix#27 — `.init_array` invocation + DT_NEEDED chain
  walking in the user-space loader.
* esaurez/libxml2#1 — the libxml2.so this PR's binaries reference
  must be present in the buildroot. This implies a sequenced
  rollout: merge esaurez/libxml2#1 first, cut a new
  nanvix/libxml2 release, then this PR's CI build can resolve
  `-lxml2` to libxml2.so. Until then, CI continues to satisfy
  `-lxml2` against the existing libxml2.a in the release tarball,
  which produces a libxslt.so without a DT_NEEDED libxml2.so
  entry. The end-state expects libxml2.so to be present.

End-to-end validation (DT_NEEDED chain successfully resolved by
the Nanvix loader at dlopen time) is performed downstream in
esaurez/lxml#1 and the CPython lxml integration.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@esaurez esaurez force-pushed the feat/build-shared-library branch from 20f4179 to a6be239 Compare June 4, 2026 17:01
esaurez pushed a commit to esaurez/lxml that referenced this pull request Jun 4, 2026
Produce position-independent liblxml_etree.so and
liblxml_elementpath.so alongside the existing static archives,
wired as a real DT_NEEDED chain on top of esaurez/libxml2 +
esaurez/libxslt:

  liblxml_etree.so       -> NEEDED libxslt.so, libexslt.so, libxml2.so
  liblxml_elementpath.so -> (pure-Cython, no native deps)

Only the cython-generated lxml.etree.c is embedded in
liblxml_etree.so; libxslt, libxml2, and libz live in their own
.so files and are pulled in transitively by the Nanvix dynamic
loader at dlopen time. This exercises the DT_NEEDED chain support
shipped in esaurez/nanvix#27 in a real-world setting and
eliminates the multi-megabyte per-module duplication that a
self-contained build would cause.

Concretely:

* `-fPIC` is added to the per-source compile commands, so the
  same .o files are usable for both .a and .so.
* Two new SHAREDLIB targets link via `-shared -fPIC -nostdlib
  -Wl,--whole-archive <own>.a -Wl,--no-whole-archive [-lxslt
  -lexslt -lxml2]`, setting DT_SONAME=liblxml_etree.so /
  DT_SONAME=liblxml_elementpath.so.
* `.nanvix/z.py` `output_files` and the Makefile's `package` /
  `verify-package` targets ship both the static and shared
  variants.

Sizes (stripped, DT_NEEDED chain vs the discarded self-contained
prototype):

  liblxml_etree.so       1.7 MB (was 3.5 MB)
  liblxml_elementpath.so 157 KB (was 153 KB; pure-Cython, no deps)

Runtime dependencies:

* esaurez/nanvix#27 — `.init_array` invocation + DT_NEEDED chain
  walking in the user-space loader.
* esaurez/libxml2#1 + esaurez/libxslt#1 — libxml2.so, libxslt.so,
  and libexslt.so must be present in the buildroot. This implies
  a sequenced rollout: merge libxml2#1 -> release -> bump libxslt's
  pin -> merge libxslt#1 -> release -> bump this PR's pins ->
  merge this PR.

End-to-end validation (DT_NEEDED chain resolved by the Nanvix
loader: liblxml_etree.so -> libxslt.so -> libxml2.so) will land
in a follow-up against esaurez/cpython#11. CPython's Phase 4 will
switch from the MODLIBS-piggyback workaround to a clean dlopen
of liblxml_etree.so, letting python.elf shrink by ~3 MB.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@esaurez

esaurez commented Jun 9, 2026

Copy link
Copy Markdown
Owner Author

Superseded by upstream PR nanvix#75, which carries the same .so build addition -- rebased onto current upstream version branch, scope-trimmed (CI workflow downgrade, nanvix.toml version downgrade, .gitignore tweaks, .zutils-version downgrade, z.ps1/z.sh additions all dropped as unrelated env drift), and esaurez/* / python.elf / nanvix-todo references cleaned from commit message and PR body. Closing this fork PR; tracking continues upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant