-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Infra in QEfficient for execution of swiftkv models #314
base: main
Are you sure you want to change the base?
Conversation
QEfficient/transformers/models/llama_swiftkv/modeling_llama_swiftkv.py
Outdated
Show resolved
Hide resolved
QEfficient/transformers/models/llama_swiftkv/modeling_llama_swiftkv.py
Outdated
Show resolved
Hide resolved
…e is not present at hugging face and checkpoint like swiftkv Signed-off-by: Hem Agnihotri <[email protected]>
Signed-off-by: Hem Agnihotri <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Hem Agnihotri <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
QEfficient/transformers/models/llama_swiftkv/modeling_llama_swiftkv.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Amit Raj <[email protected]>
super().__init__() | ||
self.hidden_size = config.hidden_size | ||
self.attention_dropout = config.attention_dropout | ||
self.hidden_size = config.hidden_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please remove the redundant self.hidden_size initialization
def forward( | ||
self, | ||
hidden_states: torch.Tensor, | ||
position_ids, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please add the hint type for position_ids and attention_mask for consistency. (Applicable for classes below as well)
"for auto-regressive decoding with k/v caching, please make sure to initialize the attention class " | ||
"with a layer index." | ||
) | ||
kv_seq_len = past_key_value.get_usable_length(kv_seq_len, self.layer_idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it the right usage? According to the hint type past_key_value is Tuple[torch.Tensor, torch.Tensor]
) | ||
kv_seq_len = past_key_value.get_usable_length(kv_seq_len, self.layer_idx) | ||
cache_kwargs = {"position_ids": position_ids, "batch_index": batch_index} | ||
key_states, value_states = past_key_value.read_only(self.layer_idx, cache_kwargs=cache_kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above
key_states = repeat_kv(key_states, self.num_key_value_groups) | ||
value_states = repeat_kv(value_states, self.num_key_value_groups) | ||
|
||
attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is these attention implementation still applicable with transformer 4.50?
self, | ||
hidden_states: torch.Tensor, | ||
position_ids, | ||
past_key_value: Optional[Tuple[torch.Tensor, torch.Tensor]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure the naming consistency of the past_key_value parameter across all classes. In some classes, it is referred to as past_key_value, while in others, it is named past_key_values
@@ -36,6 +36,58 @@ class QEffDynamicCache(DynamicCache): | |||
|
|||
""" | |||
|
|||
def write_only(self, key_states, value_states, layer_idx, cache_kwargs): | |||
# Update the cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add doc string for both methods
@@ -33,6 +33,7 @@ | |||
| **Phi3ForCausalLM** | Phi-3, Phi-3.5 | [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) | ✔️ | | |||
| **QwenForCausalLM** | DeepSeek-R1-Distill-Qwen | [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | ✔️ | | |||
| | Qwen2, Qwen2.5 | [Qwen/Qwen2-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct) | ✔️ | | |||
| **LlamaSwiftKVForCausalLM** | swiftkv | [Snowflake/Llama-3.1-SwiftKV-8B-Instruct](https://huggingface.co/Snowflake/Llama-3.1-SwiftKV-8B-Instruct) | ✔️ | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please update the feature support under docs/source/quick_start.md Supported Features
Added Infra in QEfficient for execution of models whose modelling file is not present at hugging face and checkpoint like swiftkv