Fix in LoRA code #2114

mseeger · 2025-08-28T20:04:42Z

The computation of lora_ind only works for models with n_head == n_query_groups and n_embd == n_head * head_size. This is not the case for the Qwen3-4B model, for example.

mseeger · 2025-08-29T06:57:19Z

This PR only changes LoRA code. The failing test does not use LoRA. I suspect maybe the atol and rtol for this test may be a little too tight? Or do you see a different explanation?

Borda · 2025-09-01T12:14:52Z

I suspect maybe the atol and rtol for this test may be a little too tight? Or do you see a different explanation?

I would be fine to relax the tol. a bit...
cc: @t-vi

mseeger · 2025-09-01T16:53:34Z

The relative difference is large. But when I run this on my Mac, the test passes. Have not tried on GPU.

Are we sure the test passes on main?

mseeger · 2025-09-02T11:04:48Z

I ran the test in question on a GPU instance. It passes before and after this PR:

(valkeyrie) ubuntu@ip-172-31-26-205:~/git/litgpt$ pytest tests/test_model.py -k test_against_original_gemma_2
======================================================================= test session starts =======================================================================
platform linux -- Python 3.12.3, pytest-8.4.1, pluggy-1.6.0
benchmark: 5.1.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/ubuntu/git/litgpt
configfile: pyproject.toml
plugins: dependency-0.6.0, anyio-4.10.0, rerunfailures-16.0, benchmark-5.1.0, timeout-2.4.0
collected 583 items / 579 deselected / 4 selected

tests/test_model.py ..xx                                                                                                                                    [100%]

========================================================== 2 passed, 579 deselected, 2 xfailed in 9.98s ===========================================================

mseeger · 2025-09-02T11:06:18Z

It seems to exclude the two GPU tests. I have no idea why.

I don't know how to diagnose this further.

mseeger · 2025-09-02T11:12:13Z

OK, I commented out the pytest.mark.xfail(raises=AssertionError, strict=False), and now the test in question (test_against_original_gemma_2 in test_models.py) fails with quite large errors, BOTH in main and in my branch.

This means this test should either be fixed or commented out. The latter happens on my instance, due to this pytest.mark.xfail(raises=AssertionError, strict=False), but somehow the CI system seems to run the test?

How to proceed here? @t-vi

mseeger · 2025-09-02T11:15:50Z

Maybe it is also the case the CI system runs the tests on CPU and this fails. But the CPU tests work for me, both on my Mac laptop and an EC2 instance. The comments in the test don't sound re-assuring. I'd recommend to comment out this test altogether until we are sure the code can be made to do what the HF side is doing as well?

Borda · 2025-09-04T12:50:47Z

cc: @t-vi ^^ 🐿️

t-vi

Thank you @mseeger @Borda

mseeger requested review from lantiga, t-vi and Borda as code owners August 28, 2025 20:04

Borda approved these changes Aug 28, 2025

View reviewed changes

mseeger force-pushed the fix_lora branch from c4d3760 to 72ecc2d Compare September 1, 2025 09:07

Fix in LoRA code

fcd4136

mseeger force-pushed the fix_lora branch from 550850c to fcd4136 Compare September 2, 2025 11:03

Borda assigned t-vi Sep 2, 2025

Borda added 2 commits September 4, 2025 20:39

Merge branch 'main' into fix_lora

093031f

Merge branch 'main' into fix_lora

39f2004

t-vi approved these changes Sep 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix in LoRA code #2114

Fix in LoRA code #2114

Uh oh!

mseeger commented Aug 28, 2025

Uh oh!

mseeger commented Aug 29, 2025

Uh oh!

Borda commented Sep 1, 2025

Uh oh!

mseeger commented Sep 1, 2025

Uh oh!

mseeger commented Sep 2, 2025

Uh oh!

mseeger commented Sep 2, 2025

Uh oh!

mseeger commented Sep 2, 2025

Uh oh!

mseeger commented Sep 2, 2025

Uh oh!

Borda commented Sep 4, 2025

Uh oh!

t-vi left a comment

Uh oh!

Uh oh!

Fix in LoRA code #2114

Are you sure you want to change the base?

Fix in LoRA code #2114

Uh oh!

Conversation

mseeger commented Aug 28, 2025

Uh oh!

mseeger commented Aug 29, 2025

Uh oh!

Borda commented Sep 1, 2025

Uh oh!

mseeger commented Sep 1, 2025

Uh oh!

mseeger commented Sep 2, 2025

Uh oh!

mseeger commented Sep 2, 2025

Uh oh!

mseeger commented Sep 2, 2025

Uh oh!

mseeger commented Sep 2, 2025

Uh oh!

Borda commented Sep 4, 2025

Uh oh!

t-vi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!