LLama Adapters for GPT #2417
Replies: 2 comments 1 reply
-
In general, these kinds of PRs are welcome. I'm not sure how easy it would be to add support for the GPT2 architecture. But if you can figure something out, we would merge it. |
Beta Was this translation helpful? Give feedback.
-
I am currently in the process of evaluating my implementation, extending Llama Adapter to GPT2. I have had some success training them, but I am having trouble with two of the tests. For Llama theese are the tests config=GPT2Config(
vocab_size=16,
hidden_size=8, #mapped to n_embd
n_layers=8, #mapped to n_layers
num_attention_heads=4, #mapped to n_head
use_cache=True,
attn_implementation="eager"
)
input_ids = torch.LongTensor([[1, 1, 1], [2, 1, 2]]).to(device)
target_ids = torch.LongTensor([[0, 0, 0], [0, 0, 0]]).to(device)
attention_mask = torch.LongTensor([[1, 1, 1], [1, 0, 1]]).to(device)
torch.manual_seed(42)
np.random.seed(42)
random.seed(42)
# Create and compare gpt2 model outputs .
model_gpt2 = GPT2Model(create_test_gpt2_config())
model_gpt2 = model_gpt2.to(device)
a= model_gpt2(input_ids=input_ids, attention_mask=attention_mask)
b= model_gpt2(input_ids=input_ids, attention_mask=attention_mask)
assert_close(a.last_hidden_state, b.last_hidden_state, rtol=0, atol=0) This will fail along the lines of
Now I understand that making sure the adapter does not effect the output (when zero initialized and when disabled) is important. Do you have any ideas on how to test this given the constraint of a non-deterministic GPT2? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Cheers,
I'm working on a research project adding control conditions to a GPT2 based model, and I am interested in the approach used by Llama-Adapter , which combines adapter and soft-prompt/prefix tuning, which I think could be suitable to incooperate outside control information. The current implementation of the Llama-Adapter/AdaptionPrompt only supports Llama and Mistral models, thought the concept is model agnostic. Would adding GPT2 support be a welcome contribution, or am I missing something more fundamental.
Beta Was this translation helpful? Give feedback.
All reactions