-
Notifications
You must be signed in to change notification settings - Fork 63
[release/v1.21.0]: Updating docs for 1.21.0 #683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
docs/source/release_docs.md
Outdated
|
|
||
| - **Mistral 3.1 (24B)** | ||
| - Executable via [`QEffAutoModelForCausalLM`](#QEffAutoModelForCausalLM) | ||
| - Production-ready deployment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
docs/source/release_docs.md
Outdated
| - **Olmo2** | ||
| - Executable via [`QEffAutoModelForCausalLM`](#QEffAutoModelForCausalLM) | ||
| - Full CausalLM support with optimizations | ||
| - Bug fixes included |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
|
We are missing a few more models, Olmo, Molmo and Wave2Vec2 @abukhoy please add these models as well |
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
docs/source/release_docs.md
Outdated
| ## Key Features & Enhancements | ||
|
|
||
| - **Framework Upgrades**: Transformers `4.55`, PyTorch `2.7.0+cpu`, Torchvision `0.22.0+cpu` | ||
| - **Python Support**: Now requires Python `>=3.9` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to keep python support as 3.10
| @@ -0,0 +1,84 @@ | |||
| # Diffuser Classes | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we follow the similar approach like qeff_autoclasses.html? Add small examples and keep only the user exposed classes? @quic-amitraj can you suggest
| ``` | ||
|
|
||
| (QEFFAutoModelForCTC)= | ||
| ## `QEFFAutoModelForCTC` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add an example here
docs/source/supported_features.rst
Outdated
| - Enabled for GPT-OSS models, allowing for flexible deployment of large language models across different hardware configurations. | ||
| * - `ONNX Sub-Functions <https://github.com/quic/efficient-transformers/pull/621>`_ | ||
| - Feature enabling more efficient model compilation and execution on hardware. | ||
| * - `Continuous Batching (VLMs) <https://github.com/quic/efficient-transformers/pull/610>`_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove Continuous Batching (VLMs)
docs/source/supported_features.rst
Outdated
| - Implements a blocked K/V cache layout so attention reads/processes the cache blockbyblock, improving longcontext decode performance. | ||
| * - `Memory Profiling Tool <https://github.com/quic/efficient-transformers/pull/674>`_ | ||
| - Adds scripts to profile memory during export/compile/infer (peak usage, cache footprint) for quicker diagnosis. Refer `sample scripts <https://github.com/quic/efficient-transformers/tree/main/scripts/memory_profiling>`_ for more **details**. | ||
| * - `ONNX transform, memory & time optimizations <https://github.com/quic/efficient-transformers/pull/640>`_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove it. it's an optimization and it's not a standalone feature
docs/source/supported_features.rst
Outdated
| - Adds scripts to profile memory during export/compile/infer (peak usage, cache footprint) for quicker diagnosis. Refer `sample scripts <https://github.com/quic/efficient-transformers/tree/main/scripts/memory_profiling>`_ for more **details**. | ||
| * - `ONNX transform, memory & time optimizations <https://github.com/quic/efficient-transformers/pull/640>`_ | ||
| - Adds periodic memory cleanup (e.g., to FP16ClipTransform / SplitTensorsTransform) during largetensor processing, and avoids redundant external data loading when already present | ||
| * - Onboarding Guide |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
docs/source/supported_features.rst
Outdated
| - Extended to Vision Language Models with multi-image handling capabilities, optimizing throughput and latency by dynamically batching requests with varying image counts. Refer `sample script <https://github.com/quic/efficient-transformers/blob/main/examples/image_text_to_text/models/granite_vision/continuous_batching.py>`_ for more **details**. | ||
| * - `BlockedKV attention in CausalLM <https://github.com/quic/efficient-transformers/pull/618>`_ | ||
| - Implements a blocked K/V cache layout so attention reads/processes the cache blockbyblock, improving longcontext decode performance. | ||
| * - `Memory Profiling Tool <https://github.com/quic/efficient-transformers/pull/674>`_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
| | Architecture | Model Family | Representative Models | [vLLM Support](https://quic.github.io/cloud-ai-sdk-pages/latest/Getting-Started/Installation/vLLM/vLLM/index.html) | | ||
| |-------------------------|--------------------|--------------------------------------------------------------------------------------|--------------| | ||
| | **FalconForCausalLM** | Falcon** | [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b) | ✔️ | | ||
| | **MolmoForCausalLM** | Molmo① | [allenai/Molmo-7B-D-0924](https://huggingface.co/allenai/Molmo-7B-D-0924) | ✕ | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| | Architecture | Model Family | Representative Models | [vLLM Support](https://quic.github.io/cloud-ai-sdk-pages/latest/Getting-Started/Installation/vLLM/vLLM/index.html) | | ||
| |-------------------------|--------------------|--------------------------------------------------------------------------------------|--------------| | ||
| | **FalconForCausalLM** | Falcon** | [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b) | ✔️ | | ||
| | **MolmoForCausalLM** | Molmo① | [allenai/Molmo-7B-D-0924](https://huggingface.co/allenai/Molmo-7B-D-0924) | ✕ | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check with @quic-vargupt and confirm if these models are supported on vllm and update it accordingly
Signed-off-by: Abukhoyer Shaik <[email protected]>

This is created for updating documentation for 1.21.0 release.
Note: First Draft