Skip to content

Conversation

mho22
Copy link
Collaborator

@mho22 mho22 commented Sep 3, 2025

Motivation for the change, related issues

This is a pull request to dynamically load Intl in PHP.wasm Web.

Related issues and pull requests

Issues

Pull requests

Implementation details

  • Removal of static Intl options in PHP compilation
  • Set up of PHP as a MAIN_MODULE in node and web
  • Correction of PHP: Do not pull WebGL in Playground web #2318 by addingworker to the [web] environment
  • Improvement of build file for shared libraries
  • Implementation of Intl dynamic extension lazy loading logic in PHP.wasm web
  • Creation of a ignore-lib-imports Vite plugin
  • Cypress E2E tests implementation for PHP.wasm web by duplicating existing ones from PHP.wasm Node
  • Creation of a virtual alias for wasm-feature-detect to simulate JSPI mode enabled based on Cypress ENV
  • CI jobs implementation to test PHP.wasm web in JSPI and Asyncify mode

Testing Instructions (or ideally a Blueprint)

CI

🧪 test-e2e-php-wasm-web-jspi
🧪 test-e2e-php-wasm-web-asyncify

Next steps

  • Experimental PHP.wasm Node JSPI 8.3
  • PHP.wasm Node JSPI
  • PHP.wasm Node Asyncify
  • Experimental PHP.wasm Web JSPI 8.3
  • Experimental PHP.wasm Web Asyncify 8.3
  • PHP.wasm Web JSPI
  • PHP.wasm Web Asyncify
  • Remove artifacts in PHP.wasm
  • Remove artifacts in Playground
  • Move Xdebug in shared directory alongside Intl

@mho22 mho22 force-pushed the add-intl-dynamic-extension-support-to-php-wasm-web branch 2 times, most recently from ce5ac2f to 9915b90 Compare September 9, 2025 14:53
@mho22
Copy link
Collaborator Author

mho22 commented Sep 9, 2025

Little summary :

  1. I first retried to build PHP and Intl with MAIN_MODULE for Web. It was a success.
screenshot-020
  1. I then remembered something happened when MAIN_MODULE was enabled [PHP: Do not pull WebGL in Playground web #2318]. I first tried to reproduce the document is not defined error and I easily did that by running npm run dev.

  2. I then tried to set MAIN_MODULE and SIDE_MODULE equal 2 instead of the default 1. It failed. Each file compiled successfully but I always got the same error :

PHP Startup: Invalid library (maybe not a PHP library) '/internal/shared/extensions/intl.so'

I finally found out the resulting intl.so file when compiling with SIDE_MODULE=1 was 5.6 Mb while SIDE_MODULE=2 was 78 bytes. I understood the MAIN_MODULE was then responsible for keeping the resource. I supposed this was not the solution.

  1. I came back with MAIN_MODULE=1 and tried to fix the document is not defined issue. Which I did first by replacing the problematic code :
-  var specialHTMLTargets = [0, document, window];
+  var specialHTMLTargets = [0, typeof document != 'undefined' ? document : 0, typeof window != 'undefined' ? window : 0];
  /** @suppress {duplicate } */
  var findEventTarget = (target) => {
      target = maybeCStringToJsString(target);
-      var domElement = specialHTMLTargets[target] || document.querySelector(target);
+     var domElement = specialHTMLTargets[target] || (typeof document != 'undefined' ? document.querySelector(target) : null);
      return domElement;
    };

And running npm run dev didn't crash.

  1. I kept looking for specialHTMLTargets in emscripten repository and I found these lines in libhtml5.js :
#if ENVIRONMENT_MAY_BE_WORKER || ENVIRONMENT_MAY_BE_NODE || ENVIRONMENT_MAY_BE_SHELL || PTHREADS
  $specialHTMLTargets: "[0, typeof document != 'undefined' ? document : 0, typeof window != 'undefined' ? window : 0]",
#else
  $specialHTMLTargets: "[0, document, window]",
#endif
settings.ENVIRONMENT_MAY_BE_WORKER = not settings.ENVIRONMENT or 'worker' in settings.ENVIRONMENT

 
 So I added worker to the ENVIRONMENT variable and it successfully ran npm run dev again.
 
 @adamziel WDYT? Is this the right approach or do you think I should investigate MAIN_MODULE=2 further?

@mho22
Copy link
Collaborator Author

mho22 commented Sep 9, 2025

I had in mind to also try to add php-wasm-web tests using vitest in jsdom environment and use a vitest.setup.ts file that will emulate fetch with fs.

@adamziel
Copy link
Collaborator

adamziel commented Sep 9, 2025

So I added worker to the ENVIRONMENT variable and it successfully ran npm run dev again.

Good find! It should be fine as long as we're not breaking loading that script on a regular web page (not in a worker). And if we are breaking it, that may still be fine, but let's acknowledge that and discuss any consequences.

@adamziel
Copy link
Collaborator

adamziel commented Sep 9, 2025

I had in mind to also try to add php-wasm-web tests using vitest in jsdom environment and use a vitest.setup.ts file that will emulate fetch with fs.

Let's just use a specific E2E testing setup. I've tried that in the past in this repo and jsdom was notoriously failing to simulate the browser environment or catch any real errors.

@mho22 mho22 force-pushed the add-intl-dynamic-extension-support-to-php-wasm-web branch 3 times, most recently from c511369 to 52fe96a Compare September 14, 2025 14:29
@mho22 mho22 force-pushed the add-intl-dynamic-extension-support-to-php-wasm-web branch from f244496 to f54d3c5 Compare September 15, 2025 12:33
@mho22
Copy link
Collaborator Author

mho22 commented Sep 15, 2025

@adamziel

Good find! It should be fine as long as we're not breaking loading that script on a regular web page (not in a worker). And if we are breaking it, that may still be fine, but let's acknowledge that and discuss any consequences.

the environment is now web,worker instead of only web so I suppose the previous behavior for web pages could be fused with the worker behavior?

Let's just use a specific E2E testing setup. I've tried that in the past in this repo and jsdom was notoriously failing to simulate the browser environment or catch any real errors.

I implemented a php-wasm-web:e2e nx command based on the playground-website:e2e one and it runs with success in CI! We could, in another pull request, maybe duplicate the files from Node to Web by replacing everything related to Vitest by Cypress?

I also managed to run the tests in JSPI and Asyncify separately by using chrome for JSPI and electron for Asyncify. I guess we can also run the tests in Asyncify mode with chrome via a different setup. On it.

@mho22
Copy link
Collaborator Author

mho22 commented Sep 15, 2025

I ended up only removing the close function in node because it made the last test crash only for node versions :

if [ "$PLATFORM" = "node" ]; then \
	# Calling close() on a file descriptor acquired through WASI syscalls can trigger a JS call/await
	# during a non-resumable C++ stack frame, leading to "RuntimeError: trying to suspend JS frames".
	# Since ICU maps the file into memory and does not require the descriptor after mapping
	# under our build context, skipping close(fd) avoids that suspension error.
	# NOTE: This means the file descriptor will remain open until process teardown.
	# This is acceptable here because ICU data files are loaded only once at startup.
	/root/replace.sh 's/close\(fd\);//' /root/icu/source/common/umapfile.cpp; \
fi; \

I tried to add the close method in JSPI_IMPORTS and JSPI_EXPORTS but the close function comes from ICU code, not PHP's Intl extension. Like an extension in an extension.

@mho22 mho22 marked this pull request as ready for review September 15, 2025 21:23
@mho22 mho22 requested a review from a team as a code owner September 15, 2025 21:23
@mho22
Copy link
Collaborator Author

mho22 commented Sep 15, 2025

@adamziel That's it I think! The first full dynamic extension with its associated tests in Node, Web and Playground.

I will clean up the old artifacts from static Intl and Playground CLI in the next pull request, to keep this one clean.

Should I leave withICU option in php-wasm-web load-runtime for backwards compatibility?

@mho22
Copy link
Collaborator Author

mho22 commented Sep 17, 2025

Currently, only test-e2e-php-wasm-web-asyncify passes. Something is blocking JSPI. the resolvePHP function in load-php-runtime.ts never gets called. In fact, onRuntimeInitialized() isn’t triggered at all. I’ll investigate and figure out what’s going on.

@adamziel
Copy link
Collaborator

Thank you @mho22!

@mho22
Copy link
Collaborator Author

mho22 commented Sep 18, 2025

I had to upgrade playwright version from 1.47.1 to 1.53.2. The new version includes JSPI supported Chrome version 137. Unfortunately the test-e2e-playwright fail now. I guess I'll need to update thes tests now.

@adamziel
Copy link
Collaborator

The error says:

4539 pixels (ratio 0.01 of all image pixels) are different.

This is the diff:

website-old-diff

I think we can just increase the threshold. It doesn't seem like an actual failure. It's weird it would happen in this PR of all 🤷

@adamziel
Copy link
Collaborator

Try putting 10_000 here:

https://github.com/WordPress/wordpress-playground/blob/trunk/packages/playground/website/playwright/e2e/deployment.spec.ts#L18

@mho22
Copy link
Collaborator Author

mho22 commented Sep 18, 2025

🎉

@adamziel
Copy link
Collaborator

I would expect these .wasm binaries to get smaller, but they're significantly larger. Do we know what happened there? Is there a way we can ship this PR without frontloading additional 8MB of data on playground.wordpress.net?

CleanShot 2025-09-18 at 14 00 06@2x

@mho22
Copy link
Collaborator Author

mho22 commented Sep 18, 2025

I guess this is because of MAIN_MODULE. But you're right, 8Mb is significantly higher than expected.

@adamziel
Copy link
Collaborator

What does it add to the binary? Are those additions relevant? It worked without them earlier on and it was smaller despite shipping an additional php extension. Can we post process the wasm binary and remove the extra stuff? Or get emscripten to not include it in the first place?

@mho22
Copy link
Collaborator Author

mho22 commented Sep 18, 2025

I'm currently listing the different steps and possibilities. I already tried MAIN_MODULE=2 and one of the tests passed. This doesn't help us a lot but 2.6Mb is better than nothing.

 packages/php-wasm/web/public/php/jspi/8_3_25/php_8_3.wasm  | Bin 24607385 -> 22006934 bytes
 packages/php-wasm/web/public/php/jspi/php_8_3.js           | Bin 577829 -> 161805 bytes

I'm still investigating.

@mho22
Copy link
Collaborator Author

mho22 commented Sep 19, 2025

Ok. I found something interesting.

First of all, why does MAIN_MODULE add Mb when built :

  1. MAIN_MODULE adds system libraries inside the binary to link them to SIDE_MODULE if they need them.
  2. MAIN_MODULE disables Dead Code Elimination.

Based on this, I found two ways to decrease the wasm file size while adding the MAIN_MODULE option during build :

  1. Disable unused system libraries with EMCC_FORCE_STDLIBS environement variable.
  2. Use normal Dead Code Elimination by adding -s MAIN_MODULE=2 and list explicitely EXPORTED_FUNCTIONS.

But to be honest, these two options were not satisfying enough. I decided to make some kind of sizes benchmark of the wasm binary and its composition : with and without MAIN_MODULE :

I hope this will be readable.

[ n.b. GD needs LIBZIP, CURL needs OPENSSL and LIBZIP and OPENSSL needs MBSTRING and MBREGEX to build individually ]

 

MODULE WASM file JS File Size difference with vanilla binary
PHP without MAIN_MODULE 3,96 Mb 141 kb -
MBREGEX 3,97 Mb 141 kb +0,01 Mb
MBSTRING 4,91 Mb 141 kb +0,95 Mb
LIBZIP 4,18 Mb 141 kb +0,22 Mb
OPCACHE 4,36 Mb 141 kb +0,40 Mb
NETWORKING 3,96 Mb 141 kb +0,01 Mb
EXIF 4,96 Mb 141 kb +1,00 Mb
LIBXML 5,17 Mb 141 kb +1,21 Mb
ICONV 4,88 Mb 141 kb +0,92 Mb
FILEINFO 5,29 Mb 141 kb +1,33 Mb
SQLITE 5,34 Mb 14 kb +1,38 Mb
CLI_SAPI 4,07 Mb 141 kb +0,11 Mb
LIBZIP + GD 5,32 Mb 141 kb +1,36 Mb [ 0,22 Mb + 1,14 Mb ]
LIBZIP + OPENSSL + CURL 6,00 Mb 141 kb +2,04 Mb [ 0,22 Mb + 2,04 Mb + 0,00 Mb ]
MBREGEX + MBSTRING + OPENSSL 6,96 Mb 141 kb +3,00 Mb [ 0,01 Mb + 0,95 Mb + 2,04 Mb ]

 

Now using MAIN_MODULE

MODULE WASM file JS File Size difference with vanilla binary
PHP with MAIN_MODULE 6,61 Mb 570 kb -
MBREGEX 7,17 Mb 570 kb +0,57 Mb
MBSTRING 7,70 Mb 570 kb +1,10 Mb
LIBZIP 6,88 Mb 570 kb +0,28 Mb
OPCACHE 6,76 Mb 570 kb +0,16 Mb
NETWORKING 6,61 Mb 577 kb +0,01 Mb
EXIF 7,76 Mb 570 kb +1,16 Mb
LIBXML 8.00 Mb 570 kb +1,40 Mb
ICONV 7,54 Mb 570 kb +0,94 Mb
FILEINFO 14,66 Mb 570 kb +8,06 Mb
SQLITE 8,08 Mb 570 kb +1,48 Mb
CLI_SAPI 6,76 Mb 570 kb +0,16 Mb
LIBZIP + GD 8,20 Mb 570 kb +1,60 Mb [ 0,28 Mb + 1,32 Mb ]
LIBZIP + OPENSSL + CURL 9,40 Mb 570 kb +2,80 Mb [ 0,28 Mb + 2,24 Mb + 0,28 Mb ]
MBREGEX + MBSTRING + OPENSSL 10,51 Mb 570 kb +3,91 Mb [ 0,57 Mb + 1,10 Mb + 2,24 Mb ]

 

Now a comparison between the increase based on the individual static extension :

MODULE WASM file without MAIN_MODULE WASM file with MAIN_MODULE difference
MBREGEX +0,01 Mb +0,57 Mb +0,56 Mb
MBSTRING +0,95 Mb +1,10 Mb +0,15 Mb
LIBZIP +0,22 Mb +0,28 Mb +0,06 Mb
OPCACHE +0,40 Mb +0,16 Mb +0,24 Mb
NETWORKING +0,01 Mb +0,01 Mb +0,00 Mb
EXIF +1,00 Mb +1,16 Mb +0,16 Mb
LIBXML +1,21 Mb +1,40 Mb +0,19 Mb
ICONV +0,92 Mb +0,94 Mb +0,02 Mb
FILEINFO +1,33 Mb +8,06 Mb +6,73 Mb
SQLITE +1,38 Mb +1,48 Mb +0,10 Mb
CLI_SAPI +0,11 Mb +0,16 Mb +0,05 Mb
LIBZIP + GD +1,36 Mb +1,60 Mb +0,18 Mb
LIBZIP + OPENSSL + CURL +2,04 Mb +2,80 Mb +0,76 Mb
MBREGEX + MBSTRING + OPENSSL +3,00 Mb +3,91 Mb +0,91 Mb
  1. I noticed the +6,73 Mb spectacular increase size while enabling FILEINFO in the build.
  2. Adding MAIN_MODULE on a empty build adds 2,65 Mb to the build. Probably related to system libaries.

Now the overall additionnal Mb from all the extensions enabled equals 2,19 Mb if I ignore FILEINFO.

Why is FILEINFO adding that amount of Mb? IIUC, this is related to the use of MAIN_MODULE (obviously), since it becomes a standalone module, the static extension have to be built with their related data. Like INTL, FILEINFO needs data and as a static library, it accesses it with local system libraries. With MAIN_MODULE it has to be built with its library inside the build.

So, I don't know if this is quite possible right now but I would like to suggest to transform FILEINFO into a dynamic extension. This will probably free up to 8Mb while adding 5 Mb due to MAIN_MODULE.

Summary :

Current PHP.Wasm web without MAIN_MODULE with FILEINFO with INTL : 16,1 Mb
Possible PHP.wasm web with MAIN_MODULE without FILEINFO without INTL : 16,56 Mb
Possible PHP.wasm web with MAIN_MODULE=2without FILEINFOwithout INTL : 13,96 Mb

That sounds promising right ?

@adamziel
Copy link
Collaborator

adamziel commented Sep 19, 2025

It does sound promising, thank you for this great research!

So, I don't know if this is quite possible right now but I would like to suggest to transform FILEINFO into a dynamic extension. This will probably free up to 8Mb while adding 5 Mb due to MAIN_MODULE.

The way I understand it, is we only need the system libraries that are already shipped before this PR. Whatever makes the additional FILEINFO-related 8MB or MAIN_MODULE-related 5Mb can probably be slashed.

It sounds like these are additional libraries loaded just in case some dynamic library tries to load them later on – but we know there is no dynamic library that will need that later on because we're just splitting php.wasm into php.wasm + intl.so. The current system libraries are enough today so they should also be enough for the dynamic library

Can we inspect the built wasm file for the functions it ships, diff that with the wasm file we have today, and just blanket remove all the additional functions?

@mho22
Copy link
Collaborator Author

mho22 commented Sep 30, 2025

  1. During the meet-up, @adamziel and I discussed about this pull request and what was exactly adding these +8Mb of additional data. The first element was : Because of MAIN_MODULE, the data file needed by fileinfo, normally available outside of the wasm file had to be added inside the wasm file. To correct that, I emptied the array with these two lines :
# Remove fileinfo if needed
RUN if [ "$WITH_FILEINFO" = "yes" ]; \
    then \
		echo -n ' --enable-fileinfo' >> /root/.php-configure-flags; \

+		rm /root/php-src/ext/fileinfo/data_file.c; \

+		echo -e 'const unsigned char php_magic_database[0] = {};' > /root/php-src/ext/fileinfo/data_file.c; \

	else \
		# light bundle should compile without fileinfo and libmagic
		echo -n ' --disable-fileinfo' >> /root/.php-configure-flags; \
	fi;

But, as you may understand the issue, even if fileinfo compiles correctly, it won't work anymore since it needs this huge php_magic_database variable. So, I would like to suggest making fileinfo a dynamic extension instead, before continuing to implement this pull request.

  1. The second element was due to MAIN_MODULE=1 setting a ton of exported functions instead of the ones we only needed. I then achieved to build every version of PHP with MAIN_MODULE=2 requiring each version to have its own set of specific EXPORTED_FUNCTIONS in order to make the intl dynamic extension tests pass. Here is the list :

GLOBAL TO ALL PHP VERSIONS :

"_zend_string_init_interned", \n\
"___cxa_pure_virtual", \n\
"_executor_globals", \n\
"_std_object_handlers", \n\
"_zend_empty_string", \n\
"_timezone", \n\
"_tzname", \n\
"_zend_ce_aggregate", \n\
"__ZTVN10__cxxabiv120__si_class_type_infoE", \n\
"__ZTVN10__cxxabiv117__class_type_infoE", \n\
"__ZTVN10__cxxabiv121__vmi_class_type_infoE", \n\
"_zend_ce_exception", \n\
"_OnUpdateStringUnempty", \n\
"_OnUpdateLong", \n\
"_OnUpdateBool", \n\
"_zend_ini_boolean_displayer_cb", \n\
"_zend_ce_countable", \n\
"_zend_ce_iterator", \n\
"__ZNSt12length_errorD1Ev", \n\
"__ZTISt12length_error", \n\
"__ZTVSt12length_error", \n\
"__ZNSt20bad_array_new_lengthD1Ev", \n\
"__ZTISt20bad_array_new_length", \n\
"_zval_add_ref", \n\
"_free", \n\
"_object_init_ex", \n\
"__emalloc", \n\
"_object_properties_init", \n\
"_strlen", \n\
"_strstr", \n\
"_strchr", \n\
"__ZNSt3__211__call_onceERVmPvPFvS2_E", \n\
"__ZNSt3__25mutex4lockEv", \n\
"__ZNSt3__25mutex6unlockEv", \n\
"__ZNSt3__218condition_variable10notify_allEv", \n\
"_strcmp", \n\
"_strcpy", \n\
"_strncmp", \n\
"_strrchr", \n\
"_strncpy", \n\
"_getenv", \n\
"_setlocale", \n\
"_stat", \n\
"_open", \n\
"_mmap", \n\
"_close", \n\
"_memcmp", \n\
"_realloc", \n\
"_convert_to_double", \n\
"__safe_emalloc", \n\
"__efree", \n\
"_convert_to_long", \n\
"_zend_known_strings", \n\
"_zend_empty_array", \n\
"_compiler_globals", \n\
"_zend_add_attribute", \n\
"_zend_register_ini_entries", \n\
"_zend_register_ini_entries_ex", \n\
"_zend_declare_class_constant_long", \n\
"_zend_declare_class_constant_null", \n\
"_zend_declare_class_constant_string", \n\
"_zend_register_long_constant", \n\
"_zend_declare_class_constant_double", \n\
"_zend_register_internal_class_with_flags", \n\
"_zend_declare_typed_class_constant", \n\
"_zend_register_string_constant", \n\
"_zend_register_internal_class_ex", \n\
"_zend_unregister_ini_entries_ex", \n\
"_zend_throw_exception_ex", \n\
"_zend_strpprintf", \n\
"_zend_object_std_dtor", \n\
"_zend_object_std_init", \n\
"_zend_declare_class_constant_ex", \n\
"_zend_parse_method_parameters", \n\
"_zend_throw_error", \n\
"_zend_sort", \n\
"_zend_hash_sort_ex", \n\
"__zend_new_array_0", \n\
"_zend_hash_next_index_insert", \n\
"_zend_hash_update", \n\
"_zend_hash_index_update", \n\
"_zend_error", \n\
"_zend_parse_parameters", \n\
"_zend_replace_error_handling", \n\
"_zend_spprintf", \n\
"_zend_throw_exception", \n\
"_zend_restore_error_handling", \n\
"_zend_strtod", \n\
"_zend_wrong_parameters_none_error", \n\
"_zend_try_assign_typed_ref_long", \n\
"_zend_fcall_info_init", \n\
"_zend_array_destroy", \n\
"_zend_argument_value_error", \n\
"_zend_hash_str_find", \n\
"_zend_objects_clone_members", \n\
"_zend_call_function", \n\
"_zend_try_assign_typed_ref_str", \n\
"___zend_malloc", \n\
"_zend_alter_ini_entry", \n\
"_zend_argument_type_error", \n\
"_zend_hash_destroy", \n\
"_zend_memnstr_ex", \n\
"_zend_str_tolower", \n\
"_zend_create_internal_iterator_zval", \n\
"_zend_class_implements", \n\
"_zend_iterator_init", \n\
"_zend_update_property", \n\
"_zend_declare_typed_property", \n\
"_zend_wrong_parameters_count_error", \n\
"_zend_wrong_parameter_error", \n\
"_zend_parse_arg_str_slow", \n\
"_zend_parse_arg_long_slow", \n\
"_zend_parse_arg_str_or_long_slow", \n\
"_zend_release_fcall_info_cache", \n\
"_zend_try_assign_typed_ref_arr", \n\
"___zend_realloc", \n\
"_zend_objects_store_del", \n\
"_zend_get_gc_buffer_create", \n\
"_zend_get_gc_buffer_grow", \n\
"_zend_std_get_properties", \n\
"_zend_hash_index_find", \n\
"__zend_hash_init", \n\
"_zend_hash_str_update", \n\
"_zend_call_known_function", \n\
"_zend_std_compare_objects", \n\
"_zend_try_assign_typed_ref_bool", \n\
"_zend_hash_copy", \n\
"__zend_new_array", \n\
"_zend_argument_error", \n\
"_zend_argument_count_error", \n\
"_zend_call_method", \n\

ONLY PHP8.4 :

"_zend_register_internal_class_with_flags", \n\
"_zend_declare_typed_class_constant", \n\

ABOVE PHP8.0 :

"_zend_unregister_ini_entries_ex", \n\
"_zend_register_ini_entries_ex", \n\

ABOVE PHP7.4 :

"_zend_add_attribute", \n\
"_zend_argument_count_error", \n\
"_zend_argument_error", \n\
"_zend_argument_type_error", \n\
"_zend_argument_value_error", \n\
"_zend_call_known_function", \n\
"_zend_create_internal_iterator_zval", \n\
"_zend_get_gc_buffer_create", \n\
"_zend_get_gc_buffer_grow", \n\
"_zend_parse_arg_str_or_long_slow", \n\
"_zend_wrong_parameter_error", \n\

ABOVE PHP7.3 :

"_zend_declare_typed_property", \n\
"_zend_release_fcall_info_cache", \n\
"_zend_try_assign_typed_ref_arr", \n\
"_zend_try_assign_typed_ref_bool", \n\
"_zend_try_assign_typed_ref_long", \n\
"_zend_try_assign_typed_ref_str", \n\

ABOVE PHP7.2 :

"__zend_new_array", \n\
"__zend_new_array_0", \n\
"_object_init_ex", \n\
"_zend_empty_array", \n\
"_zend_hash_index_update", \n\
"_zend_hash_next_index_insert", \n\
"_zend_hash_str_update", \n\
"_zend_hash_update", \n\
"_zend_std_compare_objects", \n\
"_zend_string_init_interned", \n\
"_zend_wrong_parameters_none_error", \n\
"_zval_ptr_dtor", \n\

Setting MAIN_MODULE=2 makes the wasm file 2Mb lighter because of this exported functions cleanup. But it inconveniently breaks every time it needs an exported function that was not added into the list.

To be sure we have the correct amount of exported functions for intl, we should run intl tests with every PHP.wasm versions.

I would like to suggest doing this optimization in a next pull request. Since this will probably need a certain amount of time to implement all those necessary exported functions for each PHP version.

As explained above, removing fileinfo from the binary and adding MAIN_MODULE shouldn't add more than 1Mb. But I can run a new benchmark if you want.

So @adamziel, in summary, what do you think of these steps before being able to merge this pull request ? :

  1. Making fileinfo a dynamic extension, for Node and Web.
  2. Keeping MAIN_MODULE=1 temporarily until we find a way to run dynamic extension tests and find the correct amount of exported functions for each PHP version with precision.

@adamziel
Copy link
Collaborator

adamziel commented Sep 30, 2025

But, as you may understand the issue, even if fileinfo compiles correctly, it won't work anymore since it needs this huge php_magic_database variable. So, I would like to suggest making fileinfo a dynamic extension instead, before continuing to implement this pull request.

But why does it work today with the smaller .wasm file? We're not shipping an additional fileinfo database. Is it optimized out of the build? Or compressed? Can we do the same thing with intl as a dynamic library?

@adamziel
Copy link
Collaborator

achieved to build every version of PHP with MAIN_MODULE=2 requiring each version to have its own set of specific EXPORTED_FUNCTIONS in order to make the intl dynamic extension tests pass. Here is the list :

Aha, I see the issue – we need to expose any function that the built intl dynamic library may want to call. Manually listing those exports will only get us so far. To cover 100% of possible code paths, we'll need a script that extracts the imports from the built intl.wasm bundle and confirms they're exposed from the php.wasm bundle – let's run it as a part of the build process, even if only as an assertion after everything is built.

@mho22
Copy link
Collaborator Author

mho22 commented Oct 1, 2025

But why does it work today with the smaller .wasm file? We're not shipping an additional fileinfo database. Is it optimized out of the build? Or compressed? Can we do the same thing with intl as a dynamic library?

I spent some time digging into why the fileinfo extension behaves differently when building php.wasm with and without MAIN_MODULE, and here’s what I found :

1.When we build php.wasm without MAIN_MODULE, the resulting file is significantly smaller (roughly 15–16MB with fileinfo) compared to the build with MAIN_MODULE (26–27MB). Yet, the fileinfo extension still works, even without shipping an external magic database.

2.The php_magic_database is a 16MB constant defined in ext/fileinfo/data_file.c and included by apprentice.c and he is responsible for that significant increase in size. In the non-MAIN_MODULE, this file is compiled directly into the binary. However, the size increase is only about 1MB instead of 8MB.

3.Through investigation, it appears that the linker and Emscripten optimizations remove most of the unused data. Specifically : Non-MAIN_MODULE builds are “static,” meaning the binary doesn’t need to export symbols globally.

  1. The linker notices that only a small portion of php_magic_database is ever actually referenced, and it discards the rest. The final wasm file only contains the data actually needed for MIME detection of the types we use.

  2. So it’s not compression in the usual way. The unused array entries are eliminated by linker level dead data elimination. Only the bytes actually referenced by apprentice.c survive in the binary.

  3. But With MAIN_MODULE, PHP is built as a dynamic core. All global symbols must be exported to allow dynamic extensions to reference them. And php_magic_database is a global symbol. As a result, the linker cannot discard any part of php_magic_database, because it might be referenced by a future extension.

Therefore, most of the 16MB constant ends up fully included in the wasm file, increasing its size by 8MB.

This explains why the MAIN_MODULE build is much larger. The “shrinking trick” only works in static builds where the linker can be sure that nothing outside the main binary needs the data.

I am stuck here on a situation where it would be super useful to have that shrinking happen without having the possibility to use it. I tried a lot of things and by modifying the apprentice.c file it mostly made the trick, the wasm file was reduced but fileinfo returned errors on some mimetypes. I should probably dig into this deeper.


 

Aha, I see the issue – we need to expose any function that the built intl dynamic library may want to call. Manually listing those exports will only get us so far. To cover 100% of possible code paths, we'll need a script that extracts the imports from the built intl.wasm bundle and confirms they're exposed from the php.wasm bundle – let's run it as a part of the build process, even if only as an assertion after everything is built.

This is mostly right! But some missing functions make the test crash too. I tested that with the PHP8.3 build and almost every function I listed above are present except these ones :

zend_empty_array
zend_add_attribute
zend_new_interned_string
zend_parse_parameters_ex
zend_ce_traversable
zend_declare_property_null
zend_objects_destroy_object
zend_declare_class_constant_double
zend_declare_class_constant_string
zend_declare_class_constant_null
zend_declare_class_constant_long
zend_register_ini_entries
compiler_globals
zend_known_strings

Everytime I ran the Intl tests, it crashed because of one of the above missing function, I had to compile again after adding the missing function and so on until the test passed. So in theory you're correct and the files can be analyzed to list the needed EXPORTED_FUNCTIONS but in practice some errors still occur even with that new process, probably based on missing exported functions needed by the MAIN_MODULE.

FYI, there are 769 imports needed from the intl.so file.

let's run it as a part of the build process, even if only as an assertion after everything is built.

I am not sure to understand what you're meaning with "only as an assertion". Did you mean, at the end of the intl build I print the list of import functions needed, for information?

@mho22
Copy link
Collaborator Author

mho22 commented Oct 2, 2025

I tried multiple options :

  1. Rewrite the apprentice.c file to enable shrinking [ by setting the php_magic_database constant exclusive to that file
    -> It resulted to an empty php_magic_database variable.

  2. I tried to rebuild a data_file.c file based on the hex dumps in the php.wasm without MAIN_MODULE build
    -> Manually replacing hex failed because Emscripten relies on pointer offsets and relocations. Obviously.

  3. I tried to find or build a light weight magic.mgc file [ base file to create the data_file.c ]
    -> I couldn't find one and creating one with arbitrary extensions is probably not a good idea.

  4. I tried to find a way to build fileinfo separately and inject it into PHP.wasm with MAIN_MODULE.
    -> This is basically like creating a SIDE_MODULE so I should wait for your opinion on this.

  5. I looked for another way to find and use a magic.mgc file, but I didn't find a way to access one from the browser.

I came to the frustrating conclusion that it we had two last options :

  • Building PHP.wasm with MAIN_MODULE is forbidden in PHP.wasm Web
  • Setting up fileinfo as a SIDE_MODULE [ And I am not sure the shrinking will occur in a side module as well ]

I hope you'll have better ideas and options than mine...

@adamziel
Copy link
Collaborator

adamziel commented Oct 2, 2025

Thank you @mho22! Let me think about that for a moment and follow up here

@adamziel
Copy link
Collaborator

adamziel commented Oct 2, 2025

Could wasm-split help us here? It seems to enable doing a static build and splitting it into the main module and the dynamic library after the fact. We don't need to support arbitrary future PHP extensions but only the set of extensions we're already building so perhaps we could use it to eliminate dead code first and split afterwards. Alternatively, perhaps there's a way we can do a full static build, list all the symbols left inside by the linker, and then do the MAIN_MODULE build and post-process it to remove the symbols that weren't there in the static build?

@adamziel
Copy link
Collaborator

adamziel commented Oct 2, 2025

Everytime I ran the Intl tests, it crashed because of one of the above missing function, I had to compile again after adding the missing function and so on until the test passed. So in theory you're correct and the files can be analyzed to list the needed EXPORTED_FUNCTIONS but in practice some errors still occur even with that new process, probably based on missing exported functions needed by the MAIN_MODULE.

The built dynamic library should have some imports and exports listed at the top (viewable via wasm2wat or wasm-objdump). Couldn't all the required imports be extracted from there? It would be weird if the dynamic library needed a PHP function and yet it didn't list that as an import. The list of required core PHP exports must be somewhere in the built artifact, either in the binary wasm file or the js file, because it, ultimately, tries to call those functions by name.

@adamziel
Copy link
Collaborator

adamziel commented Oct 2, 2025

Basically we want the same linking/dead code elimination outcome as when only the main module it built. It seems like EMscripten won't give us that by default so we need to help it with other tools and, potentially, custom static analysis and rewriting of the generated build. If the linker can prune a large part of the php_magic_database when building without the MAIN_MODULE option, we can likely also prune it from the built binary blob (or, at least, patch emscripten to strategically eliminate dead code anyway when using MAIN_MODULE=2)

@mho22
Copy link
Collaborator Author

mho22 commented Oct 3, 2025

I am thoroughly studying your suggestions. I am learning a lot of things related to emscripten and webassembly. I haven't found a solution yet. I created an issue on emscripten and will go back to Xdebug waiting for an answer.

@mho22 mho22 marked this pull request as draft October 7, 2025 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants