Skip to content

LoRA: runtime toggle and PEFT adapter loader#316

Merged
davidkoski merged 9 commits into
ml-explore:mainfrom
smdesai:lora-toggle-and-peft
May 27, 2026
Merged

LoRA: runtime toggle and PEFT adapter loader#316
davidkoski merged 9 commits into
ml-explore:mainfrom
smdesai:lora-toggle-and-peft

Conversation

@smdesai

@smdesai smdesai commented May 27, 2026

Copy link
Copy Markdown
Contributor

Proposed changes

This PR adds two related, backward-compatible improvements to the LoRA infrastructure in MLXLMCommon.

Runtime loraEnabled toggle on LoRALayer
A new loraEnabled: Bool property on the LoRALayer protocol lets callers enable or disable the LoRA term at runtime without unloading the adapter. When false, the layer behaves as the underlying base layer (no LoRA term added).

This is needed for inference patterns that interleave LoRA-on and LoRA-off forward passes against the same model — for example, speculative decoding schemes where a LoRA-tuned drafter feeds an un-tuned verifier with a shared KV cache. Today the only way to "disable" a loaded adapter is to unload and reload it, which is too expensive to do per inference step.

Backward compatibility: Strictly additive. The default value is true (LoRA always applied, matching pre-PR behavior). External LoRALayer conformers compile unchanged because the protocol-extension default satisfies the new requirement; their toggle is silently a no-op until they opt in by adding their own stored property.

LoRAContainer.fromPEFT(directory:) — load HuggingFace PEFT adapters
A new static loader on LoRAContainer reads adapter directories in the standard HuggingFace peft format:

  • adapter_config.json (PEFT schema: r, lora_alpha, target_modules, peft_type, …)
  • adapter_model.safetensors (with base_model.model..lora_A.weight / lora_B.weight keys)

These changes are extracted from PR #310 where the model's speculative-decoding mode requires per-phase LoRA toggling and the canonical adapter ships in PEFT format.

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

@smdesai smdesai mentioned this pull request May 27, 2026
4 tasks
Comment on lines +114 to +115
if let bias { return y + bias }
return y

@davidkoski davidkoski May 27, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should call super? Linear is simple enough and unlikely to change, but since it is a subtype it might be better to call that way. It looks like LoRALinear does it that way -- I think that is the pattern to follow

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're right, it can and updated accordingly.

Comment on lines +27 to +30
// Conversion:
// - Strip the leading `base_model.model.` prefix.
// - Rename `.lora_A.weight` -> `.lora_a`, `.lora_B.weight` -> `.lora_b`.
// - Transpose both tensors to match MLX's [in, r] / [r, out] convention.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this block comment should be on fromPEFT? As it is you can only see it in the code, not in the built docs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. It's moved to fromPEFT.

// Match "<encoder|model>.layers.<n>." then return the rest. This
// matches the two common backbone layouts the project uses.
let parts = path.split(separator: ".", omittingEmptySubsequences: false)
for i in 0 ..< (parts.count - 2) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if parts.count is 0 or 1 this will trap -- might want to guard vs that and return nil.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Yes a guard needs to be in place. It's added.

@davidkoski davidkoski left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you!

@davidkoski davidkoski merged commit 5626257 into ml-explore:main May 27, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants