This repo shows two methods for syntax error-free decoding
- using a context-free grammar (defined with lark)
- using a finite-state automtata that is evaluated directly on the GPU and thus 4x faster
It uses these approaches to improve the performance of the Zephyr 7B LLM on FuncQA, a math equation benchmark. This approach outperforms current state-of-the-art.
It was developed as part of a seminar at HPI. Here are additional resources
The results of this results vs. the ToolDec baseline from the literature and ChatGPT is as follows. Details are given in the report.
Model Name | Results |
---|---|
Zephyr 7B Chat (ours) + CFG | 14.7% |
Zephyr 7B Chat (ours) + CFG + SFT | 19.1% |
ToolDec | 13.2% |
ChatGPT (0-shot) | 9.0% |
The methods implemented here are inspiered by the following two papers
- ToolDec: Syntax Error-Free and Generalizable Tool Use for LLMs via Finite-State Decoding [arxiv]
- ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings [arxiv]
You can simply reproduce the experiments by running
pip install ./requirements.txt
sh scripts/training_commands.sh
sh scripts/eval_commands.sh