[Prototype] Option to configure layers independently #168

jlamypoirier · 2025-03-03T21:24:07Z

✨ Description

This PR proposes a simple way to obtain layer-dependent configuration by leveraging Fast-LLM's existing config update mechanism. It works by providing a "default" layer configuration (same as before), and optional overrides for specified layer ranges.
See tests/test_transformer.py for examples.

The thing works, but is admittedly far from perfect and I do have some concern on user-friendliness:

The update syntax supports both single key and full dict override, ex. "normalization/epsilon": 1 overrides only normalization epsilon, while "normalization" : {"epsilon": 1} overrides the entire dict, i.e., everything other than epsilon reverts to its default value. This could be confusing and needs to be well documented.
If a layer index is covered by multiple update ranges, only the first update is applied. This could be confusing to some users. (Another option would be to apply them all in order, not sure which one is better)
Simple cases should be easy to understand, but this feature is extremely powerful and enables users to do all kinds of crazy things that may be confusing or lead to unexpected behaviour.
I moved the transformer config from transformer to layers/default, which adds a small amount of complexity when not using the feature. (We could probably revert that change though.)
For parameters (num_layers, hidden_size, full_precision_residual) overriding doesn't really make sense to override. I left them as-is and added assertions, but we may want to think about moving them away from the layer config.
I disabled layer-dependent rotary embeddings until we add support for it in the rotary preprocessor.
TensorSpace wasn't designed for that kind of thing. I made a quick fix using a hierarchy of tensor spaces, but not sure about long-term viability.
The feature makes conversion more complicated, I had to explicitly prevent conversion for any kind of layer-dependent configuration. We'll need to address this to use the feature in practice.

This feature removes the need for max_window_layers, but I kept it for now because of the likely conversion issues. @bigximik I also added back support for backup windowed attention and fixed the layer range by shifting the layer index, see comments in #157)

🔍 Type of change

Select all that apply:

🐛 Bug fix (non-breaking change that addresses a specific issue)
🚀 New feature (non-breaking change that adds functionality)
⚠️ Breaking change (a change that could affect existing functionality)
📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
📝 Documentation change (updates documentation, including new content or typo fixes)
🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

…endently

tscholak · 2025-03-06T23:56:55Z

Hey @jlamypoirier. This introduces a lot of complexity, and based on your own comments, it is not yet user-friendly or fully supported. Given that this feature is not urgent, I'd prefer we leave it unmerged until the conversion and usability concerns are properly addressed. The immediate priority is LoRA. Thanks.

jlamypoirier · 2025-03-10T15:34:09Z

Agreed this is not entirely ready, but the feature is relatively small and locked behind an experimental flag, so there wouldn't be any harm in merging so we can play with it until we have something better (we already have need for it).
And either way, we need to address the issues from #157.

jlamypoirier added 4 commits March 3, 2025 16:21

Option to configure layers independently

e9a64d2

fixes

39e0488

Merge branch 'main' into cofigure_layers_independently

5910173

fixes

dcc935c

jlamypoirier mentioned this pull request Mar 4, 2025

Extend add_linear_biases to support a dictionary of sub-layers to which linear bias should be added. #158

Closed

18 tasks

jlamypoirier added 4 commits March 4, 2025 17:26

Merge remote-tracking branch 'origin/main' into cofigure_layers_indep…

8047436

…endently

Cleanup, misc

a3a1b2c

fixes

8af56c9

misc

970b578

jlamypoirier changed the title ~~[Prototype] Option to configure layers independently~~ Option to configure layers independently Mar 5, 2025

fixes

8de8fae

jlamypoirier marked this pull request as ready for review March 6, 2025 00:47

Merge remote-tracking branch 'origin/main' into cofigure_layers_indep…

3fad309

…endently

jlamypoirier requested review from bigximik and tscholak March 6, 2025 00:47

This was referenced Mar 6, 2025

Add non-architecture Huggingface conversion parameters #166

Open

Make the model config override the pretrained config #170

Closed

tscholak mentioned this pull request Mar 10, 2025

[Prototype] LoRA #180

Closed

20 tasks

jlamypoirier changed the title ~~Option to configure layers independently~~ [Prototype] Option to configure layers independently Mar 13, 2025

jlamypoirier mentioned this pull request Mar 17, 2025

Multi-Dataset Validation (LM-Loss/Perplexity) #178

Merged

18 tasks

jlamypoirier marked this pull request as draft April 17, 2025 16:21

jlamypoirier mentioned this pull request Apr 24, 2025

Support block-modular architecture #242

Closed

4 tasks

jlamypoirier closed this Sep 18, 2025

jlamypoirier deleted the cofigure_layers_independently branch September 19, 2025 01:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Prototype] Option to configure layers independently #168

[Prototype] Option to configure layers independently #168

Uh oh!

jlamypoirier commented Mar 3, 2025 •

edited

Loading

Uh oh!

tscholak commented Mar 6, 2025

Uh oh!

jlamypoirier commented Mar 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Prototype] Option to configure layers independently #168

[Prototype] Option to configure layers independently #168

Uh oh!

Conversation

jlamypoirier commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✨ Description

🔍 Type of change

Uh oh!

tscholak commented Mar 6, 2025

Uh oh!

jlamypoirier commented Mar 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jlamypoirier commented Mar 3, 2025 •

edited

Loading