Add Qwen 2.5 #2088

shivance · 2025-02-09T11:25:40Z

Closes #2078

References:

Qwen 2.5 uses Qwen2 backbone from Huggingface Transformers
HF Config path
HF Source Code

abheesht17 · 2025-02-10T00:38:56Z

Thanks for the PR! Before review, could you please do a forward pass and match the output with HF's Qwen? Also, let's make it a draft PR till then

abheesht17

Took a cursory glance. Let's do the weight conversion and numerics check first!

keras_hub/src/models/qwen/qwen_attention.py

divyashreepathihalli · 2025-02-12T19:15:13Z

to fix code format error
you will need to run shell/api_gen.sh at root
if you don't have ruff install ruff pip install ruff and then run shell/format.sh at root.

abheesht17 · 2025-02-18T16:35:16Z

@shivance - let us know when this PR is ready for review. Thanks!

shivance · 2025-02-18T16:53:26Z

@abheesht17 I have got tokenizer working currently, I am working on matching output of HF model and keras model.
Thanks for patience!

abheesht17 · 2025-02-18T17:00:05Z

Great, no hurry. Was just checking. Do ping if you hit any blockers :)

keras_hub/src/models/qwen/qwen_attention.py

shivance · 2025-02-18T17:40:43Z

@abheesht17 I see that in newer checkpoint conversion script we use set_weights method, eg.

keras_hub_model.transformer_layers[
            i
        ]._self_attention_layer._query_dense.set_weights(
            [
                hf_model.model.layers[i]
                .self_attn.q_proj.weight.T.reshape(
                    config.hidden_size,
                    config.num_attention_heads,
                    config.hidden_size // config.num_attention_heads,
                )
                .detach()
                .cpu()
                .float()
                .numpy()
            ]
        )

instead of old kernel assign

keras_hub_model.get_layer(
            f"f_net_layer_{i}"
        )._intermediate_dense.kernel.assign(
            hf_wts[f"encoder.layer.{i}.intermediate.dense.weight"]
            .transpose(1, 0)
            .numpy()
        )

Has API changed for assigning bias as well? Why was the new method created, What is the difference?

shivance · 2025-02-18T19:09:00Z

@abheesht17 upon weight loading, outputs look like this!
there is still some delta here,

np.testing.assert_allclose(
            keras_hub_logits, hf_output_logits, atol=1e-3
        )

succeeds, i.e. absolute tolerance 1e-3.

I am testing at fp32, since it's a 0.5B model.

shivance · 2025-02-19T05:28:55Z

@abheesht17 i have marked this PR as ready for review

abheesht17 · 2025-02-20T06:18:57Z

@abheesht17 i have marked this PR as ready for review

Great. Were you able to bring the difference in numerics down to 1e-5? Might be worth checking layer-by-layer which one's causing an issue.

abheesht17 · 2025-02-21T05:28:59Z

~~@shivance - can you please share the weight conversion Colab as well?~~

Edit: never mind, the conversion script is part of the PR.

shivance · 2025-02-22T12:57:45Z

@abheesht17 here is the colab version of conversion script.

shivance · 2025-02-24T06:08:08Z

@abheesht17 did you get a chance to inspect the delta in output?

mattdangerw

Thanks! Just some initial comments and questions.

mattdangerw · 2025-02-24T20:07:34Z

keras_hub/src/models/causal_lm_preprocessor.py

@@ -77,7 +77,9 @@ def build(self, input_shape):
        # Defer packer creation to `build()` so that we can be sure tokenizer
        # assets have loaded when restoring a saved model.
        self.packer = StartEndPacker(
-            start_value=self.tokenizer.start_token_id,
+            start_value=self.tokenizer.start_token_id


Why do we need this? We pass add_start_value=self.add_start_token below when we call the layer. Seems simpler to configure the layer so the packer always knows the start value. And if a users was calling the packer directly they could just do add_start_token=True during call.

so, if you check qwen tokenizer config, it doesn't have a bos token. so in start end packer, it throws exception while it tries to access start_token_id, since it's not even there.

stacktrace:

> Keras 3 model and tokenizer loaded. Traceback (most recent call last): File "/Users/flip/Desktop/Projects/keras-hub/tools/checkpoint_conversion/convert_qwen_checkpoints.py", line 307, in <module> app.run(main) File "/Users/flip/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/absl/app.py", line 308, in run _run_main(main, args) File "/Users/flip/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/absl/app.py", line 254, in _run_main sys.exit(main(argv)) ^^^^^^^^^^ File "/Users/flip/Desktop/Projects/keras-hub/tools/checkpoint_conversion/convert_qwen_checkpoints.py", line 293, in main test_tokenizer(keras_hub_tokenizer, hf_tokenizer) File "/Users/flip/Desktop/Projects/keras-hub/tools/checkpoint_conversion/convert_qwen_checkpoints.py", line 234, in test_tokenizer keras_hub_output = keras_hub_preprocessor( ^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/flip/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler raise e.with_traceback(filtered_tb) from None File "/Users/flip/Desktop/Projects/keras-hub/keras_hub/src/models/causal_lm_preprocessor.py", line 80, in build start_value=self.tokenizer.start_token_id, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/flip/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1928, in __getattr__ raise AttributeError( AttributeError: 'Qwen2Tokenizer' object has no attribute 'start_token_id'. Did you mean: 'pad_token_id'?

mattdangerw · 2025-02-24T20:09:23Z

keras_hub/src/models/qwen/qwen_backbone.py

+@keras_hub_export("keras_hub.models.Qwen2Backbone")
+class Qwen2Backbone(Backbone):
+    """
+    #TODO:


Let's add this in before merge! Even just a one liner. "The Qwen2 decoder network."

mattdangerw · 2025-02-24T20:09:38Z

keras_hub/src/models/qwen/qwen_backbone.py

+
+
+@keras_hub_export("keras_hub.models.Qwen2Backbone")
+class Qwen2Backbone(Backbone):


How different is qwen 1 from qwen 2 btw?

How different is qwen 1 from qwen 2 btw?

There is no difference between qwen series.

keras_hub/src/models/qwen/qwen_backbone.py

mattdangerw · 2025-02-24T20:13:02Z

keras_hub/src/models/qwen/qwen_tokenizer.py

+        misc_special_tokens -= {eos_token}
+
+        # Add misc special tokens
+        for i, token in enumerate(misc_special_tokens):


What are these used for? I don't see these used anywhere. A lot of tokenizers have reserved and unused tokens (e.g. for bert the first thousand I think), we don't generally give them special treatment.

I just followed llama3 tokenizer!

mattdangerw · 2025-02-24T20:13:58Z

keras_hub/src/models/qwen/qwen_tokenizer.py

+                self._add_special_token(token, f"special_token_{i:03d}")
+                special_tokens.add(token)
+
+        # Add alternate EOS token if needed


when is this needed? and why?

mattdangerw · 2025-02-24T20:17:20Z

tools/checkpoint_conversion/convert_qwen_checkpoints.py

+        rope_max_wavelength=hf_model.config.rope_theta,
+        use_sliding_window=hf_model.config.use_sliding_window,
+        sliding_window_size=hf_model.config.sliding_window,
+        # dtype="bfloat16"


Are we saving at full precision? We should probably save as the same dtype we are converting from. If we are taking a bfloat16 bunch of weights and saving then as float32 (Keras default) we are just wasting a ton a disk space for now gain. We can still load a different dtype than save.

while doing weight matching, if i load models in float32, the weights match with atol of 1e-3, however the delta is quite wide when i load in bfloat16. The difference in the intermediate outputs starts happening from first layernorm (where casting to f32, applying norm, and casting back to bf16 happens)

@mattdangerw / @abheesht17 did you get a chance to take a look at it?

pass-lin · 2025-03-01T12:04:23Z

I think it's necessary to check in detail where the error is. As much as possible, we should ensure that the fp32 error is around 1e-5 under the torch backend. The maximum error of bf16 should not exceed 1e-2.
I've also implemented a Keras model with a similar error, and this level of error would cause a significant decrease in inference performance, as well as repetition.

shivance · 2025-03-08T11:11:10Z

@mattdangerw / @abheesht17 / @divyashreepathihalli How do you completely disable MPS backend with Keras?

Please take a look at latest conversion script, despite I am moving model to cpu using keras.device and also moving inputs, reversible embedding call step exits with

stacktrace

-> Keras 3 model and tokenizer loaded.
Traceback (most recent call last):
  File "/Users/anshuman/.pyenv/versions/3.11.10/lib/python3.11/runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/lib/python3.11/runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "/Users/anshuman/.cursor/extensions/ms-python.debugpy-2024.6.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/Users/anshuman/.cursor/extensions/ms-python.debugpy-2024.6.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/Users/anshuman/.cursor/extensions/ms-python.debugpy-2024.6.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/Users/anshuman/.cursor/extensions/ms-python.debugpy-2024.6.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.cursor/extensions/ms-python.debugpy-2024.6.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/Users/anshuman/.cursor/extensions/ms-python.debugpy-2024.6.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "/Users/anshuman/Desktop/Projects/keras-hub/tools/checkpoint_conversion/convert_qwen_checkpoints.py", line 353, in <module>
    app.run(main)
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "/Users/anshuman/Desktop/Projects/keras-hub/tools/checkpoint_conversion/convert_qwen_checkpoints.py", line 336, in main
    test_model(keras_hub_model, keras_hub_tokenizer, hf_model, hf_tokenizer)
  File "/Users/anshuman/Desktop/Projects/keras-hub/tools/checkpoint_conversion/convert_qwen_checkpoints.py", line 228, in test_model
    keras_hub_output = keras_hub_model(keras_hub_inputs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/Desktop/Projects/keras-hub/keras_hub/src/layers/modeling/reversible_embedding.py", line 129, in call
    return super().call(inputs)
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/torch/nn/functional.py", line 2516, in embedding
    return handle_torch_function(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/torch/overrides.py", line 1720, in handle_torch_function
    result = mode.__torch_function__(public_api, types, args, kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/torch/utils/_device.py", line 104, in __torch_function__
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/torch/nn/functional.py", line 2551, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Exception encountered when calling ReversibleEmbedding.call().

Placeholder storage has not been allocated on MPS device!

Arguments received by ReversibleEmbedding.call():
  • inputs=torch.Tensor(shape=torch.Size([1, 5]), dtype=int32)
  • reverse=False

the stacktrace points that somewhere allocation is still happening on MPS, which I have already disabled !!

qwen 2 backbone initial commit

7124dcf

shivance self-assigned this Feb 9, 2025

shivance requested review from mattdangerw, abheesht17 and divyashreepathihalli February 9, 2025 17:09

shivance changed the title ~~Add Qwen 2.5~~ [WIP] Add Qwen 2.5 Feb 9, 2025

abheesht17 requested changes Feb 10, 2025

View reviewed changes

keras_hub/src/models/qwen/qwen_attention.py Outdated Show resolved Hide resolved

shivance marked this pull request as draft February 10, 2025 10:00

shivance added 3 commits February 11, 2025 19:48

refactor: remove dependency on llama

f7ec4ef

add preset

8f3092d

checkpoint conversion wip

f7e8ee5

shivance added 2 commits February 18, 2025 23:06

add bias to attention

ea89608

add bias init to config

e6bf5f7

shivance commented Feb 18, 2025

View reviewed changes

keras_hub/src/models/qwen/qwen_attention.py Show resolved Hide resolved

lint + format

25f412b

shivance added 2 commits February 19, 2025 00:40

wip on weight matching

10c4d9e

lint + format

4b9d952

shivance changed the title ~~[WIP] Add Qwen 2.5~~ Add Qwen 2.5 Feb 18, 2025

shivance marked this pull request as ready for review February 18, 2025 19:20

Merge branch 'master' into qwen2.5

472a274

change tolerance to 1e-3

e38764c

add model conversion util

d747c8f

mattdangerw reviewed Feb 24, 2025

View reviewed changes

lint format + docstrings

6e00afd

test on cpu

9cdc3c7

shivance and others added 3 commits March 12, 2025 01:10

atol diff 1e-4 yayy

c37829f

lint format

7bd1e5e

Merge branch 'keras-team:master' into qwen2.5

5ce61c3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen 2.5 #2088

Add Qwen 2.5 #2088

shivance commented Feb 9, 2025 •

edited

Loading

abheesht17 commented Feb 10, 2025 •

edited

Loading

abheesht17 left a comment

divyashreepathihalli commented Feb 12, 2025 •

edited

Loading

abheesht17 commented Feb 18, 2025

shivance commented Feb 18, 2025 •

edited

Loading

abheesht17 commented Feb 18, 2025

shivance commented Feb 18, 2025 •

edited

Loading

shivance commented Feb 18, 2025

shivance commented Feb 19, 2025

abheesht17 commented Feb 20, 2025 •

edited

Loading

abheesht17 commented Feb 21, 2025 •

edited

Loading

shivance commented Feb 22, 2025

shivance commented Feb 24, 2025

mattdangerw left a comment

mattdangerw Feb 24, 2025

shivance Mar 1, 2025

mattdangerw Feb 24, 2025

shivance Mar 1, 2025

mattdangerw Feb 24, 2025

pass-lin Feb 27, 2025

mattdangerw Feb 24, 2025

shivance Mar 1, 2025

mattdangerw Feb 24, 2025

shivance Mar 1, 2025

mattdangerw Feb 24, 2025

shivance Mar 1, 2025

shivance Mar 4, 2025

pass-lin commented Mar 1, 2025

shivance commented Mar 8, 2025 •

edited

Loading



		@keras_hub_export("keras_hub.models.Qwen2Backbone")
		class Qwen2Backbone(Backbone):

Add Qwen 2.5 #2088

Are you sure you want to change the base?

Add Qwen 2.5 #2088

Conversation

shivance commented Feb 9, 2025 • edited Loading

abheesht17 commented Feb 10, 2025 • edited Loading

abheesht17 left a comment

Choose a reason for hiding this comment

divyashreepathihalli commented Feb 12, 2025 • edited Loading

abheesht17 commented Feb 18, 2025

shivance commented Feb 18, 2025 • edited Loading

abheesht17 commented Feb 18, 2025

shivance commented Feb 18, 2025 • edited Loading

shivance commented Feb 18, 2025

shivance commented Feb 19, 2025

abheesht17 commented Feb 20, 2025 • edited Loading

abheesht17 commented Feb 21, 2025 • edited Loading

shivance commented Feb 22, 2025

shivance commented Feb 24, 2025

mattdangerw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pass-lin commented Mar 1, 2025

shivance commented Mar 8, 2025 • edited Loading

shivance commented Feb 9, 2025 •

edited

Loading

abheesht17 commented Feb 10, 2025 •

edited

Loading

divyashreepathihalli commented Feb 12, 2025 •

edited

Loading

shivance commented Feb 18, 2025 •

edited

Loading

shivance commented Feb 18, 2025 •

edited

Loading

abheesht17 commented Feb 20, 2025 •

edited

Loading

abheesht17 commented Feb 21, 2025 •

edited

Loading

shivance commented Mar 8, 2025 •

edited

Loading