Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Infra in QEfficient for execution of swiftkv models #314

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

quic-hemagnih
Copy link
Contributor

Added Infra in QEfficient for execution of models whose modelling file is not present at hugging face and checkpoint like swiftkv

@quic-amitraj quic-amitraj self-requested a review March 13, 2025 12:01
@quic-hemagnih quic-hemagnih marked this pull request as ready for review March 31, 2025 09:24
quic-hemagnih and others added 5 commits April 2, 2025 10:45
…e is not present at hugging face and checkpoint like swiftkv

Signed-off-by: Hem Agnihotri <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
super().__init__()
self.hidden_size = config.hidden_size
self.attention_dropout = config.attention_dropout
self.hidden_size = config.hidden_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please remove the redundant self.hidden_size initialization

def forward(
self,
hidden_states: torch.Tensor,
position_ids,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please add the hint type for position_ids and attention_mask for consistency. (Applicable for classes below as well)

"for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
"with a layer index."
)
kv_seq_len = past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it the right usage? According to the hint type past_key_value is Tuple[torch.Tensor, torch.Tensor]

)
kv_seq_len = past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
cache_kwargs = {"position_ids": position_ids, "batch_index": batch_index}
key_states, value_states = past_key_value.read_only(self.layer_idx, cache_kwargs=cache_kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

key_states = repeat_kv(key_states, self.num_key_value_groups)
value_states = repeat_kv(value_states, self.num_key_value_groups)

attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is these attention implementation still applicable with transformer 4.50?

self,
hidden_states: torch.Tensor,
position_ids,
past_key_value: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure the naming consistency of the past_key_value parameter across all classes. In some classes, it is referred to as past_key_value, while in others, it is named past_key_values

@@ -36,6 +36,58 @@ class QEffDynamicCache(DynamicCache):

"""

def write_only(self, key_states, value_states, layer_idx, cache_kwargs):
# Update the cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add doc string for both methods

@@ -33,6 +33,7 @@
| **Phi3ForCausalLM** | Phi-3, Phi-3.5 | [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) | ✔️ |
| **QwenForCausalLM** | DeepSeek-R1-Distill-Qwen | [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | ✔️ |
| | Qwen2, Qwen2.5 | [Qwen/Qwen2-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct) | ✔️ |
| **LlamaSwiftKVForCausalLM** | swiftkv | [Snowflake/Llama-3.1-SwiftKV-8B-Instruct](https://huggingface.co/Snowflake/Llama-3.1-SwiftKV-8B-Instruct) | ✔️ |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update the feature support under docs/source/quick_start.md Supported Features

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants