Qwen2 converter #163

bigximik · 2025-02-27T10:37:02Z

✨ Description

Implements the dense Qwen2 checkpoint converter according to
Hugging Face Transformers Qwen2.

The use_sliding_window, sliding_window, and max_window_layers parameters from the HF Qwen2 configuration are ignored during conversion, as they are not part of the architecture parameters. This is the same way the sliding_window parameter is handled in the Mistral checkpoint converter.

part of the #135

🔍 Type of change

Select all that apply:

🐛 Bug fix (non-breaking change that addresses a specific issue)
🚀 New feature (non-breaking change that adds functionality)
⚠️ Breaking change (a change that could affect existing functionality)
📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
📝 Documentation change (updates documentation, including new content or typo fixes)
🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

📝 Changes

List the key changes introduced in this PR:

Implements dense Qwen2 converter

✅ Checklist

Make sure the following tasks are completed before submitting the PR:

General

📜 I have read and followed the contributing guidelines.
🏷️ I am using a clear and descriptive PR title that summarizes the key change or feature introduced.
🎉 The functionality is complete, and I have tested the changes.
📝 I have updated the documentation if needed. (not applicable)
⚠️ The change does not introduce any new issues (e.g., runtime warnings, type checker errors, linting problems, unhandled edge cases).
🧩 I have commented my code, especially in hard-to-understand areas.

Dependencies and Configuration

🐋 I have updated the Docker configuration or dependencies, if applicable. (not applicable)
🔄 I have ensured compatibility with the existing setup after dependency changes. (not applicable)

Testing

🧪 I have added or updated tests to cover my changes.
✔️ New and existing tests pass locally with my changes. (tested affected llama starcder2 and qwen2 conversion)
🚦 I have tested these changes on GPUs and verified training stability.
🏋️ I have tested the changes on realistic training workloads, if applicable.

Performance Impact (not applicable)

tscholak

Great work, @bigximik!
I think next in line is #166, wdyt?

tscholak · 2025-03-03T13:38:08Z

fast_llm/models/gpt/conversion.py

+
+
+@dataclasses.dataclass
+class IgnoreImportQwen2SlidingWindowParamsConverter(ParamConverter):


@bigximik this is fine, but can you please add a todo here that says that this is a temporary hack until we can load these params from the config?

bigximik and others added 12 commits February 24, 2025 15:13

added add_linear_biases as dict of sublayers keys

6d1c427

added simple tests for attention and mlp constructors

a4d513c

partial converter, not working

252aee2

Qwen2 only add_linear_biases changes

25ab987

clean up and formatting

b8cf6ae

merge from simple_add_linear_biases

f6a3ff5

partial qwen2 converter, non working

4d4e56b

merge from main

74bee73

merge fix

36c62b4

fix tuple access and return

52a462e

ignoring sliding window params on import

be7a496

use add_biases functionality from TransformerConfig, cleanup

814b4e6

bigximik requested review from tscholak and jlamypoirier February 27, 2025 10:37

This was referenced Feb 27, 2025

[feat] Qwen2 converter #135

Closed

Missing configuration when converting from HF model config json #166

Open

tscholak approved these changes Mar 3, 2025

View reviewed changes

tscholak merged commit 23006dc into main Mar 4, 2025
4 checks passed

tscholak deleted the qwen2 branch March 4, 2025 13:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen2 converter #163

Qwen2 converter #163

bigximik commented Feb 27, 2025

tscholak left a comment

tscholak Mar 3, 2025



		@dataclasses.dataclass
		class IgnoreImportQwen2SlidingWindowParamsConverter(ParamConverter):

Qwen2 converter #163

Qwen2 converter #163

Conversation

bigximik commented Feb 27, 2025

✨ Description

🔍 Type of change

📝 Changes

✅ Checklist

General

Dependencies and Configuration

Testing

Performance Impact (not applicable)

tscholak left a comment

Choose a reason for hiding this comment

tscholak Mar 3, 2025

Choose a reason for hiding this comment