Skip to content

Commit 0ad385c

Browse files
committed
Summary: redoing
5bf70c1 in a way that doesn't get reverted Test Plan: export MODEL_REPO=meta-llama/Llama-2-7b-chat-hf python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int4-gptq --calibration_tasks wikitext --calibration_limit 5 python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_int4-gptq.g32.cuda.pth --tasks wikitext --limit 5 Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 1b4a8b43482ff27c8a300b571b2e3e81a13b29e4 Pull Request resolved: #142
1 parent c955dac commit 0ad385c

File tree

5 files changed

+513
-23
lines changed

5 files changed

+513
-23
lines changed

GPTQ.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -150,9 +150,9 @@ def __init__(
150150
}
151151

152152
# trace model for one input
153-
one_input = [multi.values[0] for multi in inputs]
153+
one_input = tuple([multi.values[0].cpu() for multi in inputs])
154154
exported_model = torch._dynamo.export(
155-
model, aten_graph=True, pre_dispatch=True, tracing_mode="fake"
155+
model.cpu(), aten_graph=True, pre_dispatch=True, tracing_mode="fake"
156156
)(*one_input)
157157
super().__init__(exported_model.graph_module)
158158
self.new_state_dict = model.state_dict()

0 commit comments

Comments
 (0)