Transformers.js V4: Native WebGPU EP, repo restructuring, and more! #1382

xenova · 2025-07-31T03:16:21Z

This is the official, long-awaited PR that introduces Transformers.js V4. Although it's currently still in draft mode, I'll be posting updates here for early review!

See benchmarks

https://huggingface.co/onnx-community/all-MiniLM-L6-v2-ONNX:

https://huggingface.co/onnx-community/bge-base-en-v1.5-ONNX:

Improved repository formatting & consistency
- Organize modeling code into separate folders (./src/models/), grouped by model type -- models.js is getting pretty large!
- Formatting will happen later to keep the number of changes between v3 and v4 lower.
Documentation improvements and updates (particularly for older models).
Fix typing issues
- Closes Getting TS2304: Cannot find name 'PretrainedProcessorOptions'. error #1409
React native support
New models:
- Chatterbox. Closes Add support for Chatterbox #1434

* ONNX Runtime improvements (experimental native webgpu; fix iOS) (#1231) * customize the wasm paths * update implementation * allow using 'webgpu' in nodejs binding * update version of onnxruntime-node * Upgrade onnxruntime-web to same version as onnxruntime-node * Update list of supported devices --------- Co-authored-by: Joshua Lochner <[email protected]> * customize the wasm paths (#1250) * customize the wasm paths * update implementation * [internal] Add is_decoder option to session retrieval for preferred output location * Update tests * Formatting * Bump ort versions * Bump onnxruntime-node version * Bump versions * Bump ORT versions * Bump versions * Only check webgpu fp16 for non-node environments * Fix * Assume node supports webgpu * Update ORT node support comment * Relax test strictness * Update conversion script versions * Downgrade onnxslim * cleanup * Update package-lock.json * Update onnxruntime versions * Update post-build script * Use built-in session release function * Call garbage collection after each tokenizer test * Do not double-throw error * Fix race-condition in build process with file removal * Update versions * Bump jinja version * [version] Update to 3.6.3 * Bump jinja version to support new features * [version] Update to 3.6.3 * Add support for LFM2 models (#1367) * Use prefix in lfm2 output location (#1369) * Update package-lock.json * Run `npm audit fix` * Add special tokens in text-generation pipeline if tokenizer requires (#1370) * Add special tokens in text-generation pipeline if tokenizer requires * Fix logits processors tests * Update bundles.test.js * Update comment * Formatting * Add support for ModernBERT Decoder (#1371) * Use from/to buffer instead of string Actually fixes #1343 * Add support for Voxtral (#1373) * Support longform voxtral processing (#1375) * [version] Update to 3.7.0 * Add support for Arcee (#1377) * Optimize tensor.slice() (#1381) * Optimize tensor.slice() The performance of executing `tensor.slice()` is super poor, especially for the 'logits' tensor with large dimensions. ``` const logits = outputs.logits.slice(null, -1, null);` ``` This is because currently implementation of the `slice` method manually iterates through each element and calculate indices which is a big time consuming if the tensor shape is large. For cases like `slice(null, -1, null)`, where the slicing operation is contiguous along certain dimensions, which can be optimized by bulk copy by using `TypeArray.subarray()` and `TypeArray.set()`. * nit * Add a few more tensor slice unit tests --------- Co-authored-by: Joshua Lochner <[email protected]> --------- Co-authored-by: Yulong Wang <[email protected]> Co-authored-by: Wanming Lin <[email protected]>

alfredomariamilano · 2025-09-03T15:19:10Z

@xenova is there an ETA for v4? I've pulled down the repo and tested it to use webgpu in Node/Electron and it was chef's kiss

EDIT: Even an alpha channel on npm would be beneficial and more people could try it out too.

bil-ash · 2025-09-06T02:41:56Z

@xenova Please add 2-bit quantization, the support for which has been added in recent commits of onnxruntime(for CPU and WebGPU, not sure about WASM)

xenova and others added 30 commits December 23, 2024 14:10

Move adaptive retrieval demo

1f1c645

Move code completion demo

363aefa

Move florence-2 demo

6c96600

Move depth anything demo

c4b3a76

Move tokenizer playground demo

29bdc5b

Move node audio processing demo

98958ff

Move remove background web demo

ac5b758

Move webgpu whisper demo

b3ac89e

Move depth estimation video demo

48b46d6

Move vanilla js demo

a361b38

Move node demo

1634399

Move whisper word timestamps demo

4f7b7f4

Move musicgen web demo

b0ac200

Move cross encoder demo

4ccb405

Move text-to-speech client demo

bc006e5

Move video object detection demo

7cdea27

Move video background removal demo

b053db2

Move Segment Anything demo

eacdbd1

Move zero-shot classification demo

e890716

Move browser extension template

1ee3d46

Move semantic image search demo

56297b0

Move semantic audio search demo

97a3c48

Move webgpu embedding benchmark demo

0bf2fec

Merge branch 'main' into move-examples

c033537

Merge branch 'main' into move-examples

e20167a

Merge branch 'main' into v4

140f106

Upgrade sharp version

bf12225

Bump versions

5e2c942

Delete gh-pages.yml

9f9f6f2

xenova added 11 commits July 31, 2025 00:01

Update .prettierrc

276f3e7

Formatting

3c7cf47

Fix type issues

3b4f4c4

Bump onnxruntime versions

f9ca642

Bump onnxruntime versions

90dccb2

Bump versions again

9f20bd5

Bump versions again again

719a6f8

Update versions

129a2a2

Allow config to disable kv_cache inputs

e585ff9

fix

682772d

Bump onnxruntime versions

fd9f523

xenova added 6 commits September 8, 2025 19:35

Bump onnxruntime versions

56d59ab

Merge branch 'main' into v4

97e59ba

Fix typo

23e98d9

Improve typing for TemplateProcessing post processor

4afdd4c

Update hub types

da893aa

Update tokenizers

0252d88

This was referenced Sep 30, 2025

"Rotary interleaved attention is not supported" error in WebGPU implementation for MobileLLM #1416

Open

EmbeddingGemma usage #1418

Open

WebGPU/CoreML support for node.js #1430

Open

xenova added 5 commits October 11, 2025 00:10

Add support for Chatterbox

3f30425

Fix types

ec3d845

Support voice caching

a5d6f0f

Update GenerationFunctionParameters type

fc244df

Use init chain instead of single wasmInitPromise

e7fe55b

This was referenced Oct 13, 2025

Add support for Chatterbox #1434

Open

Kokoro with WebGPU on Android produces corrupted audio #1320

Open

Error: using ceil() in shape computation is not yet supported for MaxPool #1435

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Transformers.js V4: Native WebGPU EP, repo restructuring, and more! #1382

Transformers.js V4: Native WebGPU EP, repo restructuring, and more! #1382

xenova commented Jul 31, 2025 •

edited

Loading

Uh oh!

alfredomariamilano commented Sep 3, 2025 •

edited

Loading

Uh oh!

bil-ash commented Sep 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Transformers.js V4: Native WebGPU EP, repo restructuring, and more! #1382

Are you sure you want to change the base?

Transformers.js V4: Native WebGPU EP, repo restructuring, and more! #1382

Conversation

xenova commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alfredomariamilano commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bil-ash commented Sep 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xenova commented Jul 31, 2025 •

edited

Loading

alfredomariamilano commented Sep 3, 2025 •

edited

Loading