Releases: neuralmagic/compressed-tensors
Releases · neuralmagic/compressed-tensors
Compressed Tensors v0.11.0
What's Changed
- Fix nightly issues with python 3.11 by @dhuangnm in #371
- Clean up disk space and restore ubuntu 22.04 runner by @dhuangnm in #373
- [Transform] Update tests to use conftest file by @kylesayrs in #367
- [Transform] Hadamard Permutations by @kylesayrs in #329
- [Transform] Construct on GPU, cache on CPU by @kylesayrs in #352
- enable code coverage collection and reporting INFERENG-1049 by @derekk-nm in #382
- Deprecate
iter_named_leaf_modules
anditer_named_quantizable_modules
by @kylesayrs in #381 - Added support for compression on meta device by @shanjiaz in #376
- Add torch.float64 as a viable dtype for scales by @eldarkurtic in #379
- [Transform]
apply_transform_config
by @kylesayrs in #348 - [Compression] Fix compression device movement in cases of indexed devices by @kylesayrs in #384
- Enable code coverage report for nightly tests by @dhuangnm in #388
- [Bugfix] Only quant-compress modules with weight quantization by @kylesayrs in #387
- [Transform] Fix config serialization by @kylesayrs in #396
- [Transform] Do not fuse div operation into hadamard matrices by @kylesayrs in #395
- [Transform] Implement multi-headed transforms by @kylesayrs in #383
- Support DeepSeekV3-style block FP8 quantization by @mgoin in #372
- [Transform] [Utils] Canonical matching utilities by @kylesayrs in #392
- [Bugfix] Safeguard against submodule parameter deletion in decompress_model by @kylesayrs in #347
- fix block quantization initialization by @shanjiaz in #403
- [Utils] Skip internal modules when matching by @kylesayrs in #404
- [Quantization][Decompression] Fix QDQ for dynamic quant; Update NVFP4 Compression Params by @dsikka in #407
- [Utils] Support matching vLLM modules by @kylesayrs in #413
- Fix block size inference logic by @shanjiaz in #411
- [Transform] Serialize with tied weights by @kylesayrs in #370
- [Transform] [Utils] Support precision, add torch dtype validation by @kylesayrs in #414
- [Transform] Serialize transforms config by @kylesayrs in #412
- Error when configs are created with unrecognized fields by @kylesayrs in #386
- revert forbid constraint on QuantizationConfig by @brian-dellabetta in #418
- Revert "[Transform] Serialize transforms config (#412)" by @dsikka in #419
- added wrapper for execution device by @shanjiaz in #417
- [Transform] Serialize config (include format) by @dsikka in #420
- exclude transform_config from quantization_config parse by @brian-dellabetta in #421
- [Quantization] Support more than one quant-compressor by @dsikka in #415
- [QuantizationScheme] Validate format by @dsikka in #424
- [Utils] Expand
is_match
by @kylesayrs in #416 - fix match.py syntax by @shanjiaz in #426
- [Offload] Fully remove dispatch by @kylesayrs in #427
Full Changelog: 0.10.2...0.11.0
Compressed Tensors v0.10.2
What's Changed
- [Hotfix] Implement quantization compressor methods on dense compressor by @kylesayrs in #344
- [Hotfix] Implement method on dense compressor by @kylesayrs in #345
- [Transform] Factory classes with shared memory and offloading by @kylesayrs in #316
- [Transform] [Bugfix] Fix enum value serialization in python>=3.11 by @kylesayrs in #350
- Remove redundant call by @eldarkurtic in #349
- [Accelerate] Rename and simplify
force_cpu_offload
by @kylesayrs in #354 - [Transform] Extend set of known Hadamard matrices by @kylesayrs in #351
- [Accelerate] Fix
offloaded_dispatch
, implementdisable_offloading
by @kylesayrs in #355 - [Accelerate] Extend functionality of
register_offload_parameter
by @kylesayrs in #356 - [Bugfix] Fix saving of models dispatched by
offloaded_dispatch
by @kylesayrs in #357 - [Bugfix] Only update direct params in
disable_offloading
by @kylesayrs in #360 - reference updated reportportal_submit_execution_results action by @derekk-nm in #362
- [Accelerate] Expand
get_execution_device
to support models by @kylesayrs in #363 - [Accelerate] Fix typos in
get_execution_device
by @kylesayrs in #365
New Contributors
- @derekk-nm made their first contribution in #362
Full Changelog: 0.10.1...0.10.2
Compressed Tensors v0.10.1
What's Changed
- [Transform] Hadamard and Matrix Transform Utils by @kylesayrs in #330
- Fix error on import whenever accelerate is absent by @maresb in #342
New Contributors
Full Changelog: 0.10.0...0.10.1
Compressed Tensors v0.10.0
What's Changed
- Updates to build system by @dbarbuzzi in #304
- [Utils] add align_modules by @kylesayrs in #282
- Enable module state_dict compression, simplify compression logic by @kylesayrs in #302
- Fix
_initialize_scale_zero_point
initializing on the wrong device by @mgoin in #295 - Revert "Enable module state_dict compression, simplify compression lo… by @kylesayrs in #306
- [Bugfix] Fix shape calculation for group quantization by @kylesayrs in #308
- Enable module state_dict compression, simplify compression logic by @kylesayrs in #307
- Clarify decompression return type by @kylesayrs in #310
- Clarify
match_param_name
return type by @kylesayrs in #312 - [Compressor][NVFP4] Support FP4 Compression by @dsikka in #311
- [NVFP4] Update FloatArgs and NVFP4 by @dsikka in #313
- fix signatures on model_validator functions by @brian-dellabetta in #314
- [Performance] Add memory compression and decompression pathways by @kylesayrs in #301
- Model Compression: Set compression status by @kylesayrs in #318
- [NVFP4] Enable Fp4 Quantization; introduce / apply global_scales by @dsikka in #315
- [NVFP4] Skip fused global scale calculation if already fused by @dsikka in #322
- Update default observer to be
MSE
by @shanjiaz in #300 - [Misc] Generics typehinting for
RegistryMixin
by @kylesayrs in #320 - Revert "Update default observer to be
MSE
(#300)" by @dsikka in #323 - [NVFP4] Add
tensor_group
strategy; enable NVFP4 Activations by @dsikka in #317 - [Transforms] Transform Args, Scheme, and Config by @kylesayrs in #321
- [NVFP4] Expand dynamic types, clean-up conditions by @dsikka in #325
- Use different runner for UPLOAD job by @dbarbuzzi in #327
- [NVFP4] Use torch.compile when rounding to NVFP4 by @dsikka in #331
- [Tests] Update test_fp8_quant.py by @dsikka in #337
- [Tests] Fix test scale init for group quant by @dsikka in #338
- [Quantization] Update group quantization by @dsikka in #336
- [NVFP4] update global scale generation by @dsikka in #339
- [Transform] Accelerate Utilities by @kylesayrs in #328
- Model Compression: Delete offload by @kylesayrs in #319
- [Decompression] Keep unused parameters when decompressing from memory by @kylesayrs in #340
- [NVFP4] Small Nits by @dsikka in #341
New Contributors
Full Changelog: 0.9.4...0.10.0
Compressed Tensors v0.9.4
What's Changed
- Remove compression_ratio calculation by @dsikka in #293
- Build with setuptools scm by @dhuangnm in #292
- fix a few minor issues by @dhuangnm in #294
- Some fixes for AWQ by @rahul-tuli in #269
- Fix upload issue when package already existed on PyPI by @dhuangnm in #297
- Update action tags by @dhuangnm in #298
- Pick up fix from nm-actions by @dhuangnm in #299
- [Compressor] Update packed compressor to support zp packing by @dsikka in #296
- [Decompression] Update Decompression Lifecycle by @dsikka in #285
- [Accelerate] allow get_execution_device to be used when initializing a model by @kylesayrs in #303
Full Changelog: 0.9.3...0.9.4
Compressed Tensors v0.9.3
What's Changed
- remove testmo by @dhuangnm in #258
- update tag for summary-test action by @dhuangnm in #259
- [Bugfix] Support offloaded parameters when initializing KV cache parameters by @kylesayrs in #261
- Update: CompressedLinear to decompress once by @rahul-tuli in #266
- [BugFix]:
AttributeError
inCompressedLinear
by @rahul-tuli in #273 - Fix case when using weight_packed, not weight by @dsikka in #278
- Report test results to Report Portal by @dhuangnm in #271
- use fine-grained token for workflow by @dhuangnm in #283
- Rectify Asym Compression/Decompression Pathways by @dsikka in #225
- Bump CT Version by @dsikka in #288
Full Changelog: 0.9.2...0.9.3
Compressed Tensors v0.9.2
What's Changed
- ModelCompressor type checking import by @kylesayrs in #220
- Fix warning for dynamic quantization args by @kylesayrs in #227
- Depreciate get_observer by @kylesayrs in #214
- Accelerate Utilities: Throw warning when updating with different shapes by @kylesayrs in #231
- Use faster operations on packed-quantized, add tests by @horheynm in #211
- Update build workflow to Python 3.12 by @dbarbuzzi in #248
- Replace
COMPRESSION_PARAM_NAMES
with Abstract Property by @rahul-tuli in #249 - Kylesayrs/update readme by @brian-dellabetta in #252
- Add: missing and unexpected keys in ModelCompressor by @rahul-tuli in #250
- switch runners by @dhuangnm in #254
- Bump version for patch release by @dsikka in #255
Full Changelog: 0.9.1...0.9.2
Compressed Tensors v0.9.1
What's Changed
- BugFix: Shape should be a flat list by @rahul-tuli in #241
- Bump: Compressed Tensors version for release by @rahul-tuli in #244
Full Changelog: 0.9.0...0.9.1
Compressed Tensors v0.9.0
What's Changed
- Replace depreciated pydantic functions by @kylesayrs in #221
- [Bugfix] Update expected shape for per token strategy by @kylesayrs in #210
- Accelerate Utilities by @kylesayrs in #193
- Update & fix Testmo actions/logic by @dbarbuzzi in #230
- Composability by @rahul-tuli in #219
- Add 24 sparse bitmask by @rahul-tuli in #235
- Fix: Disable Sparse Decompression for Dense Compressors by @rahul-tuli in #237
- Update: Test for Compatibility with Transformers 4.48 by @rahul-tuli in #239
- Inline 'get_release_and_version' definition by @dbarbuzzi in #240
- switch runners from a100 to h100 for now by @dhuangnm in #242
- bump for release by @dsikka in #243
New Contributors
- @dbarbuzzi made their first contribution in #230
Full Changelog: 0.8.1...0.9.0
Compressed Tensors v0.8.1
What's Changed
- Skip accelerate tests by @kylesayrs in #208
- Remove QuantizationScheme.default_scheme by @kylesayrs in #202
- Allow ModelCompressor.from_pretrained to load from quantization_config, not compression config by @horheynm in #207
- Quantization Scheme Validation by @kylesayrs in #209
- Fix uninitialized variable in quantized compressors by @markmc in #205
- Implement aliasable mixin and alias activation ordering by @kylesayrs in #213
- Revert "Implement aliasable mixin and alias activation ordering (#213)" by @dsikka in #217
- Implement aliasable mixin and alias activation ordering (python3.9 fix) by @kylesayrs in #218
- bump by @dsikka in #226
New Contributors
Full Changelog: 0.8.0...0.8.1