We construct FormalML from (our own fork of) the following libraries:
cd extraction
cd AutoML
lake update
lake build
cd ..cd repl
lake update
lake build
cd ..Run the extraction script:
./run_all.shThe generated benchmark files will be saved in:
./AutoML/FormalML
extraction/
├── AutoML/
│ ├── FormalML/ # Output benchmark files
│ └── ... # AutoML-related code
├── repl/ # lean-repl dependency
├── run_all.sh # Script for extracting theoremsNotes
- Make sure Python is installed for running the extraction script.
- Ensure Lean and lake are properly installed and available in your
PATH.
We provide a unified evaluation framework for whole proof generation methods. The evaluation process consists of the following steps:
cd evaluation
python generation.py --prover_name deepseek_v15_rl --gpu 4 --dataset_path "zzhisthebest/LeanBenchmark" --n 32We utilize a modified version of kimina-lean-server (adapted for our evaluation environment) with Lean version 4.18.0:
cd kimina-lean-server
pip install -e .
cp .env.template .env
bash setup.sh
bash setup_local.shcd ..
python eval.py --input_file file_name