Skip to content

Support frozen weights #183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 of 4 tasks
jlamypoirier opened this issue Mar 11, 2025 · 0 comments · Fixed by #185
Closed
1 of 4 tasks

Support frozen weights #183

jlamypoirier opened this issue Mar 11, 2025 · 0 comments · Fixed by #185
Assignees
Labels
enhancement New feature or request

Comments

@jlamypoirier
Copy link
Collaborator

jlamypoirier commented Mar 11, 2025

🎯 Goal (What & Why)

Fast-LLM creates gradient and optimizer state buffers for all parameters even if they are frozen. This degrades both memory usage and speedups from freezing weights, and is a blocker for LoRA (#149).

🚀 Execution Plan

Step 1: What is the smallest working version?

Step 2: What additional optimizations are possible (later, out-of-scope for nowl)?

  • Avoid storing a separate full-precision copy (shard) of the frozen weights when the 2-bit copy is enough. This will prevent excessive state memory usage when using a small number of gpus (up to ~3x for single-gpu)
  • Avoid reconstructing the frozen weights on every training step if they don't need to be. This will save a whole lot of unnecessary communication and potential network overhead with ZeRO-1/2
  • Weight freezing is not considered part of the architecture, yet will necessarily change the weight layout. We'll need additional safety checks to avoid accidental misuse (ex. loading distributed checkpoints in the wrong format). Note: this further breaks the architecture/non-architecture split, making things like Missing configuration when converting from HF model config json #166 and [Prototype] Make the model config override the pretrained config #171 more relevant.

📌 Acceptance Criteria (Must-Haves for Completion)

  • Things work as described above

🛠️ Project Management

  • Assign the project to the Fast-LLM project.
  • Set the Estimate field (in days) in the GitHub project.
  • Use the Size field to categorize the PR size (Small/Medium/Large).
  • Assign an owner when opening the issue.
@jlamypoirier jlamypoirier added the enhancement New feature or request label Mar 11, 2025
@jlamypoirier jlamypoirier self-assigned this Mar 11, 2025
@jlamypoirier jlamypoirier mentioned this issue Mar 11, 2025
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant