Address issue 21 and add distributed_tensor module #22

kevinyang-cky · 2025-11-12T23:10:05Z

Removed duplicate description of distributed scalers and corrected DataFrame creation and reading methods.

This module mirrors the code structure and methodology of distributed.py, but focuses specifically on implementing distributed tensor-scaling classes for PyTorch. Obsolete attributes (e.g., self.is_array) and unused methods (such as extract_array, get_column_order, and package_transformed_x) from distributed.py have been removed. The extract_x_columns method has also been simplified. For the fit method, input tensors are expected to be free of NaN values—a reasonable requirement since training datasets should not contain NaNs. The module requires PyTorch 2.8.0, which is enforced via an assertion at initialization.

save_scaler is commented out for now, as the custom serialization for tensors still needs to be built.

Moving the tests for the distributed_tensor module to a separate script.

Add unit tests for DStandardScalerTensor and DMinMaxScalerTensor, following the example in distributed_test.py for DStandardScaler and DMinMaxScaler.

kevinyang-cky · 2025-11-22T00:29:10Z

Besides addressing issue #21, I have also added the distributed_tensor module distributed_tensor.py, which is the torch.tensor version of DStandardScaler and DMinMaxScaler. The unit test script is distributed_tensor_test.py.

DStandardScalerTensor and DMinMaxScalerTensor are also tested with the example in the docs and produced identical results (see screenshots below). But keep in mind that in the example, dss_combined = np.sum([dss_1, dss_2]) cannot be simply converted todss_combined = torch.sum([dss_1, dss_2]) since torch.sum() only accepts tensors. Do dss_combined = dss_1 + dss_2 instead. I can also include this part in the unit test script if it is worth it.

Happy Thanksgiving! :)

djgagne

I have some small requested changes to aid in CI test passage and functionality with more than a base version of pytorch.

bridgescaler/distributed_tensor.py

bridgescaler/tests/distributed_tensor_test.py

The restriction to PyTorch 2.8.0 applied only to an early iteration of the code and is no longer relevant. According to the documentation, the "unbiased" argument in torch.var was renamed to "correction" beginning with PyTorch 2.0; therefore, impose a version minimum requirement of 2.0.0. Tested the module with the latest version 2.9.1, and other versions >= 2.0.0 worked fine.

kevinyang-cky · 2025-12-04T18:49:51Z

Comments addressed, and CI tests passed.

djgagne

LGTM

kevinyang-cky added 2 commits November 12, 2025 15:13

Refactor distributed scalers documentation and examples

6e43e3f

Removed duplicate description of distributed scalers and corrected DataFrame creation and reading methods.

Update NumpyEncoder to handle numpy version differences

46c8fb5

kevinyang-cky requested a review from djgagne November 12, 2025 23:10

kevinyang-cky changed the title ~~Address issue #21~~ Address issue 21 Nov 12, 2025

kevinyang-cky changed the title ~~Address issue 21~~ Address issue 21 and add distributed_tensor module Nov 21, 2025

kevinyang-cky added 3 commits November 21, 2025 16:57

Add tests for DStandardScalerTensor and DMinMaxScalerTensor

f8a44bd

save_scaler is commented out for now, as the custom serialization for tensors still needs to be built.

Changed back to the version in main branch

dc148cc

Moving the tests for the distributed_tensor module to a separate script.

Implement tests for distributed tensor scalers

27a56c5

Add unit tests for DStandardScalerTensor and DMinMaxScalerTensor, following the example in distributed_test.py for DStandardScaler and DMinMaxScaler.

djgagne added 4 commits December 2, 2025 09:29

Updated workflow to run actions during PR

e231d78

Update CI

4c6da28

Fix error

0fc02f2

Trying this

4e93be8

djgagne requested changes Dec 4, 2025

View reviewed changes

bridgescaler/distributed_tensor.py Outdated Show resolved Hide resolved

bridgescaler/tests/distributed_tensor_test.py Outdated Show resolved Hide resolved

kevinyang-cky added 3 commits December 4, 2025 09:59

Fix import path for distributed tensor classes

9f79d84

Remove handling for np.complex_ type

47bd6d4

djgagne approved these changes Dec 4, 2025

View reviewed changes

djgagne merged commit e87bd69 into main Dec 4, 2025
1 check passed

kevinyang-cky mentioned this pull request Dec 4, 2025

Errors encountered when going through documentation #21

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address issue 21 and add distributed_tensor module #22

Address issue 21 and add distributed_tensor module #22

Uh oh!

kevinyang-cky commented Nov 12, 2025

Uh oh!

kevinyang-cky commented Nov 22, 2025 •

edited

Loading

Uh oh!

djgagne left a comment

Uh oh!

Uh oh!

Uh oh!

kevinyang-cky commented Dec 4, 2025

Uh oh!

djgagne left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Address issue 21 and add distributed_tensor module #22

Address issue 21 and add distributed_tensor module #22

Uh oh!

Conversation

kevinyang-cky commented Nov 12, 2025

Uh oh!

kevinyang-cky commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

djgagne left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kevinyang-cky commented Dec 4, 2025

Uh oh!

djgagne left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kevinyang-cky commented Nov 22, 2025 •

edited

Loading