-
Notifications
You must be signed in to change notification settings - Fork 629
Open
Labels
module: docIssues related to documentation, both in docs/ and inlined in codeIssues related to documentation, both in docs/ and inlined in codemodule: llmIssues related to LLM examples and apps, and to the extensions/llm/ codeIssues related to LLM examples and apps, and to the extensions/llm/ codemodule: user experienceIssues related to reducing friction for usersIssues related to reducing friction for userstriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
📚 The doc issue
While we hope to provide a standardized and streamlined flow for running LLMs from HF, as well as for individually enabled models (Llama), However, there are going to be use cases where someone wants to enable a model that doesn't fit cleanly into one of these flows. Maybe it has a slightly different architecture and can't drop in our transformer definition. I ran into this recently when working with a Fairseq encoder/decoder language translation model.
I'd like to create documentation that allows for a power user to understand the following:
- Why do the optimized ET transformer implementations work? What bits are critical for performance, export compliance, etc.?
- If I have a custom transformer implementation that doesn't map exactly to the ET preferred versions, what do I need to do to make it usable with ET?
a) How do I handle attention and KV cache mutability?
b) Can I leverage the ET SDPA ops?
c) How can I use the building blocks / composable components from the extension/llm directory? (Maybe we point to torchtune, as well).
d) What do I need to do to optimize for specific backends, such as XNNPACK or CoreML?
CC @larryliu0820 @byjlw @mergennachin
Suggest a potential alternative/fix
No response
Metadata
Metadata
Assignees
Labels
module: docIssues related to documentation, both in docs/ and inlined in codeIssues related to documentation, both in docs/ and inlined in codemodule: llmIssues related to LLM examples and apps, and to the extensions/llm/ codeIssues related to LLM examples and apps, and to the extensions/llm/ codemodule: user experienceIssues related to reducing friction for usersIssues related to reducing friction for userstriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Type
Projects
Status
To triage
Status
Backlog