v0.6.1.post2

github-actions released this 13 Sep 18:35

· 698 commits to main since this release

Highlights

This release contains an important bugfix related to token streaming combined with stop string (#8468)

What's Changed

[CI/Build] Enable InternVL2 PP test only on single node by @Isotr0py in #8437
[doc] recommend pip instead of conda by @youkaichao in #8446
[Misc] Skip loading extra bias for Qwen2-VL GPTQ-Int8 by @jeejeelee in #8442
[misc][ci] fix quant test by @youkaichao in #8449
[Installation] Gate FastAPI version for Python 3.8 by @DarkLight1337 in #8456
[plugin][torch.compile] allow to add custom compile backend by @youkaichao in #8445
[CI/Build] Reorganize models tests by @DarkLight1337 in #7820
[Doc] Add oneDNN installation to CPU backend documentation by @Isotr0py in #8467
[HotFix] Fix final output truncation with stop string + streaming by @njhill in #8468
bump version to v0.6.1.post2 by @simon-mo in #8473

Full Changelog: v0.6.1.post1...v0.6.1.post2

Contributors

njhill, jeejeelee, and 4 other contributors

Assets 7