[Prototype] Option to configure layers independently#168
Closed
jlamypoirier wants to merge 10 commits intomainfrom
Closed
[Prototype] Option to configure layers independently#168jlamypoirier wants to merge 10 commits intomainfrom
jlamypoirier wants to merge 10 commits intomainfrom
Conversation
18 tasks
This was referenced Mar 6, 2025
Collaborator
|
Hey @jlamypoirier. This introduces a lot of complexity, and based on your own comments, it is not yet user-friendly or fully supported. Given that this feature is not urgent, I'd prefer we leave it unmerged until the conversion and usability concerns are properly addressed. The immediate priority is LoRA. Thanks. |
Collaborator
Author
|
Agreed this is not entirely ready, but the feature is relatively small and locked behind an experimental flag, so there wouldn't be any harm in merging so we can play with it until we have something better (we already have need for it). |
18 tasks
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
✨ Description
Fixes: #154, #155.
This PR proposes a simple way to obtain layer-dependent configuration by leveraging Fast-LLM's existing config update mechanism. It works by providing a "default" layer configuration (same as before), and optional overrides for specified layer ranges.
See
tests/test_transformer.pyfor examples.The thing works, but is admittedly far from perfect and I do have some concern on user-friendliness:
"normalization/epsilon": 1overrides only normalization epsilon, while"normalization" : {"epsilon": 1}overrides the entire dict, i.e., everything other thanepsilonreverts to its default value. This could be confusing and needs to be well documented.transformertolayers/default, which adds a small amount of complexity when not using the feature. (We could probably revert that change though.)num_layers,hidden_size,full_precision_residual) overriding doesn't really make sense to override. I left them as-is and added assertions, but we may want to think about moving them away from the layer config.TensorSpacewasn't designed for that kind of thing. I made a quick fix using a hierarchy of tensor spaces, but not sure about long-term viability.This feature removes the need for
max_window_layers, but I kept it for now because of the likely conversion issues. @bigximik I also added back support for backup windowed attention and fixed the layer range by shifting the layer index, see comments in #157)🔍 Type of change
Select all that apply: