v0.6.1.post2
github-actions
released this
13 Sep 18:35
·
698 commits
to main
since this release
Highlights
- This release contains an important bugfix related to token streaming combined with stop string (#8468)
What's Changed
- [CI/Build] Enable InternVL2 PP test only on single node by @Isotr0py in #8437
- [doc] recommend pip instead of conda by @youkaichao in #8446
- [Misc] Skip loading extra bias for Qwen2-VL GPTQ-Int8 by @jeejeelee in #8442
- [misc][ci] fix quant test by @youkaichao in #8449
- [Installation] Gate FastAPI version for Python 3.8 by @DarkLight1337 in #8456
- [plugin][torch.compile] allow to add custom compile backend by @youkaichao in #8445
- [CI/Build] Reorganize models tests by @DarkLight1337 in #7820
- [Doc] Add oneDNN installation to CPU backend documentation by @Isotr0py in #8467
- [HotFix] Fix final output truncation with stop string + streaming by @njhill in #8468
- bump version to v0.6.1.post2 by @simon-mo in #8473
Full Changelog: v0.6.1.post1...v0.6.1.post2