Skip to content

njuyxw/FormalML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory

Hugging Face

Quick Start

Data Extraction Tool

We construct FormalML from (our own fork of) the following libraries:


Requirements


Installation

1. Update and build AutoML
cd extraction
cd AutoML
lake update
lake build
cd ..
2. Update and build lean-repl
cd repl
lake update
lake build
cd ..
3. Extract theorems from source libraries

Run the extraction script:

./run_all.sh

The generated benchmark files will be saved in:

./AutoML/FormalML

Directory Structure (simplified)

extraction/
├── AutoML/
│   ├── FormalML/       # Output benchmark files
│   └── ...              # AutoML-related code
├── repl/                # lean-repl dependency
├── run_all.sh   # Script for extracting theorems

Notes

  • Make sure Python is installed for running the extraction script.
  • Ensure Lean and lake are properly installed and available in your PATH.

Evaluation Framework

We provide a unified evaluation framework for whole proof generation methods. The evaluation process consists of the following steps:

Step 1: Proof Generation

cd evaluation
python generation.py --prover_name deepseek_v15_rl --gpu 4 --dataset_path "zzhisthebest/LeanBenchmark" --n 32

Step 2: Evaluation Setup

We utilize a modified version of kimina-lean-server (adapted for our evaluation environment) with Lean version 4.18.0:

cd kimina-lean-server
pip install -e .
cp .env.template .env
bash setup.sh 
bash setup_local.sh

Step 3: Running Evaluation

cd ..
python eval.py --input_file file_name

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •