Implement tofile on tensors to reduce data write time by 40% #210

justinchuby · 2025-10-03T23:35:49Z

This PR introduces the tofile method on tensors (similarly named as the one on numpy arrays), which allows for faster write and lower memory usage on external data by bypassing tobytes().

Compatibility with existing TensorProtocols is maintained in the external data module by using tofile only when it is available in the class. The TorchTensor class in PyTorch exporter should be updated accordingly to leverage the new logic when saving.

Note that io time to disk is reduced by 40% below.

Note

TensorProtocol is not updated because we do isinstance() checks on external implementations (PyTorch). Adding the method in the protocol will cause isinstance check to fail on those implementations that have not added the tofile method.

Reference: https://github.com/microsoft/onnxscript/pull/2241/files/b2381658492510a9bcc8c0a8574db7368e33bceb

Before:

________________________________________________________
Executed in   48.08 secs    fish           external
   usr time   60.54 secs    0.00 millis   60.54 secs
   sys time   23.06 secs    1.22 millis   23.06 secs

After:

________________________________________________________
Executed in   45.69 secs    fish           external
   usr time   60.68 secs  244.00 micros   60.68 secs
   sys time   22.22 secs  518.00 micros   22.22 secs

Fix #207

Signed-off-by: Justin Chu <[email protected]>

codecov · 2025-10-03T23:36:58Z

Codecov Report

❌ Patch coverage is 80.70175% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.93%. Comparing base (feb51e5) to head (dafeaf7).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
src/onnx_ir/_core.py	79.06%	7 Missing and 2 partials ⚠️
src/onnx_ir/external_data.py	66.66%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #210      +/-   ##
==========================================
+ Coverage   76.83%   76.93%   +0.09%     
==========================================
  Files          40       40              
  Lines        4922     4994      +72     
  Branches      980      998      +18     
==========================================
+ Hits         3782     3842      +60     
- Misses        856      864       +8     
- Partials      284      288       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Justin Chu <[email protected]>

justinchuby · 2025-10-04T00:58:05Z

cc @iksnagreb

sonarqubecloud · 2025-10-04T18:56:40Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

src/onnx_ir/tensor_adapters.py

Signed-off-by: Justin Chu <[email protected]>

justinchuby · 2025-10-10T03:51:55Z

@titaiwangms @gramalingam this is ready for review, thanks.

Signed-off-by: Justin Chu <[email protected]>

titaiwangms · 2025-10-10T16:26:20Z

src/onnx_ir/_core.py

+            file: A file-like object with a ``write`` method that accepts bytes, or has an ``fileno()`` method.
+        """
+        if _supports_fileno(file) and isinstance(self._raw, np.ndarray):
+            # This is a duplication of tobytes() for handling special cases


Should we pack this to a private function, and document the reason we need it (I would say this is very technical knowledge)?

titaiwangms · 2025-10-10T16:40:17Z

src/onnx_ir/_core.py

    the tensor is recommended if IO overhead and memory usage is a concern.

    To obtain an array, call :meth:`numpy`. To obtain the bytes,
    call :meth:`tobytes`.


Probably want to add tofiile()?

titaiwangms · 2025-10-10T16:52:50Z

src/onnx_ir/_core.py

+            if self._offset is not None:
+                src.seek(self._offset)
+            bytes_to_copy = self._length or self.nbytes
+            chunk_size = 1024 * 1024  # 1MB


I am wondering why do we know this is the most efficient chunk size? Do we randomly select it?

titaiwangms · 2025-10-10T17:42:40Z

src/onnx_ir/_core.py

        """Return the bytes of the tensor."""
        return self._evaluate().tobytes()

+    def tofile(self, file) -> None:


I am wondering whether tofile() makes sense to LazyTensor. hmm

titaiwangms · 2025-10-10T17:50:17Z

src/onnx_ir/external_data.py

                data_file.write(b"\0" * (current_offset - file_size))
-            data_file.write(raw_data)
+
+            if hasattr(tensor, "tofile"):


What tensors do not have tofile()? torch? Better document this

justinchuby added 5 commits October 3, 2025 15:55

Fix endian

82b3f58

Signed-off-by: Justin Chu <[email protected]>

nvm

42d8edc

Signed-off-by: Justin Chu <[email protected]>

More implementations

63310c1

Signed-off-by: Justin Chu <[email protected]>

tofile

290ab6c

Signed-off-by: Justin Chu <[email protected]>

hasattr

1b53a6a

Signed-off-by: Justin Chu <[email protected]>

justinchuby added 5 commits October 3, 2025 17:19

tofile!

c05e189

Signed-off-by: Justin Chu <[email protected]>

write

6377435

Signed-off-by: Justin Chu <[email protected]>

always write numpy

3dc5704

Signed-off-by: Justin Chu <[email protected]>

Maintain reference

7fd35d7

Signed-off-by: Justin Chu <[email protected]>

Merge branch 'main' into justinchu/write

40cb60d

justinchuby marked this pull request as ready for review October 4, 2025 00:44

justinchuby requested review from a team and titaiwangms as code owners October 4, 2025 00:44

justinchuby requested a review from gramalingam October 4, 2025 00:44

justinchuby added the module: api label Oct 4, 2025

justinchuby mentioned this pull request Oct 4, 2025

Be smarter about torch tensors jambayk/torch-onnx-models#43

Merged

justinchuby added this to the 0.1.11 milestone Oct 4, 2025

justinchuby changed the title ~~Implement tofile on tensors~~ Implement tofile on tensors to reduce data write time by 40% Oct 6, 2025

justinchuby commented Oct 6, 2025

View reviewed changes

src/onnx_ir/tensor_adapters.py Outdated Show resolved Hide resolved

justinchuby added 4 commits October 9, 2025 13:10

Fix fileno

909344d

Signed-off-by: Justin Chu <[email protected]>

Test

e7dc301

Signed-off-by: Justin Chu <[email protected]>

test

8f832b3

Signed-off-by: Justin Chu <[email protected]>

Create tests

9afc144

Signed-off-by: Justin Chu <[email protected]>

naming

1f87be1

Signed-off-by: Justin Chu <[email protected]>

justinchuby mentioned this pull request Oct 10, 2025

[ONNX] Implement tofile on tensor pytorch/pytorch#165120

Open

versionadded

2e06f50

Signed-off-by: Justin Chu <[email protected]>

Add tests

dafeaf7

Signed-off-by: Justin Chu <[email protected]>

titaiwangms reviewed Oct 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement tofile on tensors to reduce data write time by 40% #210

Implement tofile on tensors to reduce data write time by 40% #210

justinchuby commented Oct 3, 2025 •

edited

Loading

Uh oh!

codecov bot commented Oct 3, 2025 •

edited

Loading

Uh oh!

justinchuby commented Oct 4, 2025

Uh oh!

sonarqubecloud bot commented Oct 4, 2025

Uh oh!

Uh oh!

justinchuby commented Oct 10, 2025

Uh oh!

titaiwangms Oct 10, 2025

Uh oh!

titaiwangms Oct 10, 2025

Uh oh!

titaiwangms Oct 10, 2025

Uh oh!

titaiwangms Oct 10, 2025

Uh oh!

titaiwangms Oct 10, 2025

Uh oh!

Uh oh!

Implement tofile on tensors to reduce data write time by 40% #210

Are you sure you want to change the base?

Implement tofile on tensors to reduce data write time by 40% #210

Conversation

justinchuby commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

justinchuby commented Oct 4, 2025

Uh oh!

sonarqubecloud bot commented Oct 4, 2025

Quality Gate passed

Uh oh!

Uh oh!

justinchuby commented Oct 10, 2025

Uh oh!

titaiwangms Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

titaiwangms Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

titaiwangms Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

titaiwangms Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

titaiwangms Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

justinchuby commented Oct 3, 2025 •

edited

Loading

codecov bot commented Oct 3, 2025 •

edited

Loading