LLama Adapters for GPT #2417

efraimdahl · 2025-03-10T14:39:03Z

efraimdahl
Mar 10, 2025

Cheers,
I'm working on a research project adding control conditions to a GPT2 based model, and I am interested in the approach used by Llama-Adapter , which combines adapter and soft-prompt/prefix tuning, which I think could be suitable to incooperate outside control information. The current implementation of the Llama-Adapter/AdaptionPrompt only supports Llama and Mistral models, thought the concept is model agnostic. Would adding GPT2 support be a welcome contribution, or am I missing something more fundamental.

BenjaminBossan · 2025-03-10T16:13:31Z

BenjaminBossan
Mar 10, 2025
Maintainer

In general, these kinds of PRs are welcome. I'm not sure how easy it would be to add support for the GPT2 architecture. But if you can figure something out, we would merge it.

0 replies

efraimdahl · 2025-03-14T22:45:36Z

efraimdahl
Mar 14, 2025
Author

I am currently in the process of evaluating my implementation, extending Llama Adapter to GPT2. I have had some success training them, but I am having trouble with two of the tests. For Llama theese are the tests test_sequence_adapter_ops, and test_add_and_set_while_disabled test in test_adaption_prompt.py. Both of theese tests include tests for the noninvasivness of the adapter when disabled or in its zero-initialised state before training. The way this is tested is by running a set of input_ids through the model before and after the adapters are added and making sure they are close enough. Now with GPT2 I have the following issue, the outputs are non-deterministic as discussed here and I am having trouble to enforce determinism for debugging purposes. The below code illustrates my point

config=GPT2Config(
            vocab_size=16,
            hidden_size=8, #mapped to n_embd
            n_layers=8, #mapped to n_layers
            num_attention_heads=4, #mapped to n_head
            use_cache=True,
            attn_implementation="eager"
        )
input_ids = torch.LongTensor([[1, 1, 1], [2, 1, 2]]).to(device)
target_ids = torch.LongTensor([[0, 0, 0], [0, 0, 0]]).to(device)
attention_mask = torch.LongTensor([[1, 1, 1], [1, 0, 1]]).to(device)
torch.manual_seed(42)
np.random.seed(42)
random.seed(42)

# Create and compare gpt2 model outputs .
model_gpt2 = GPT2Model(create_test_gpt2_config())
model_gpt2 = model_gpt2.to(device)
a= model_gpt2(input_ids=input_ids, attention_mask=attention_mask)
b= model_gpt2(input_ids=input_ids, attention_mask=attention_mask)
assert_close(a.last_hidden_state, b.last_hidden_state, rtol=0, atol=0)

This will fail along the lines of

AssertionError: Tensor-likes are not equal!

Mismatched elements: 48 / 48 (100.0%)
Greatest absolute difference: 1.3335938453674316 at index (0, 1, 0)
Greatest relative difference: 22.340858459472656 at index (0, 1, 7)

Now I understand that making sure the adapter does not effect the output (when zero initialized and when disabled) is important. Do you have any ideas on how to test this given the constraint of a non-deterministic GPT2?

1 reply

BenjaminBossan Mar 17, 2025
Maintainer

It would be easiest for us to help with this if you created a draft PR with your changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LLama Adapters for GPT #2417

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

LLama Adapters for GPT #2417

Uh oh!

Uh oh!

efraimdahl Mar 10, 2025

Replies: 2 comments · 1 reply

Uh oh!

BenjaminBossan Mar 10, 2025 Maintainer

Uh oh!

efraimdahl Mar 14, 2025 Author

Uh oh!

BenjaminBossan Mar 17, 2025 Maintainer

efraimdahl
Mar 10, 2025

Replies: 2 comments 1 reply

BenjaminBossan
Mar 10, 2025
Maintainer

efraimdahl
Mar 14, 2025
Author

BenjaminBossan Mar 17, 2025
Maintainer