You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fast-LLM creates gradient and optimizer state buffers for all parameters even if they are frozen. This degrades both memory usage and speedups from freezing weights, and is a blocker for LoRA (#149).
🚀 Execution Plan
Step 1: What is the smallest working version?
Create a separate buffer for frozen weights that doesn't have gradients. It can be stored in training precision and will need to be restored separately when zero-3 is involved.
Step 2: What additional optimizations are possible (later, out-of-scope for nowl)?
Avoid storing a separate full-precision copy (shard) of the frozen weights when the 2-bit copy is enough. This will prevent excessive state memory usage when using a small number of gpus (up to ~3x for single-gpu)
Avoid reconstructing the frozen weights on every training step if they don't need to be. This will save a whole lot of unnecessary communication and potential network overhead with ZeRO-1/2
🎯 Goal (What & Why)
Fast-LLM creates gradient and optimizer state buffers for all parameters even if they are frozen. This degrades both memory usage and speedups from freezing weights, and is a blocker for LoRA (#149).
🚀 Execution Plan
Step 1: What is the smallest working version?
Step 2: What additional optimizations are possible (later, out-of-scope for nowl)?
📌 Acceptance Criteria (Must-Haves for Completion)
🛠️ Project Management
Estimate
field (in days) in the GitHub project.Size
field to categorize the PR size (Small/Medium/Large).The text was updated successfully, but these errors were encountered: