FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory

Quick Start

Data Extraction Tool

We construct FormalML from (our own fork of) the following libraries:

Requirements

lean-repl

Installation

1. Update and build AutoML

cd extraction
cd AutoML
lake update
lake build
cd ..

2. Update and build lean-repl

cd repl
lake update
lake build
cd ..

3. Extract theorems from source libraries

Run the extraction script:

./run_all.sh

The generated benchmark files will be saved in:

./AutoML/FormalML

Directory Structure (simplified)

extraction/
├── AutoML/
│   ├── FormalML/       # Output benchmark files
│   └── ...              # AutoML-related code
├── repl/                # lean-repl dependency
├── run_all.sh   # Script for extracting theorems

Notes

Make sure Python is installed for running the extraction script.
Ensure Lean and lake are properly installed and available in your PATH.

Evaluation Framework

We provide a unified evaluation framework for whole proof generation methods. The evaluation process consists of the following steps:

Step 1: Proof Generation

cd evaluation
python generation.py --prover_name deepseek_v15_rl --gpu 4 --dataset_path "zzhisthebest/LeanBenchmark" --n 32

Step 2: Evaluation Setup

We utilize a modified version of kimina-lean-server (adapted for our evaluation environment) with Lean version 4.18.0:

cd kimina-lean-server
pip install -e .
cp .env.template .env
bash setup.sh 
bash setup_local.sh

Step 3: Running Evaluation

cd ..
python eval.py --input_file file_name

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
evaluation		evaluation
extraction		extraction
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory

Quick Start

Data Extraction Tool

Requirements

Installation

1. Update and build AutoML

2. Update and build lean-repl

3. Extract theorems from source libraries

Directory Structure (simplified)

Evaluation Framework

Step 1: Proof Generation

Step 2: Evaluation Setup

Step 3: Running Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

njuyxw/FormalML

Folders and files

Latest commit

History

Repository files navigation

FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory

Quick Start

Data Extraction Tool

Requirements

Installation

1. Update and build AutoML

2. Update and build lean-repl

3. Extract theorems from source libraries

Directory Structure (simplified)

Evaluation Framework

Step 1: Proof Generation

Step 2: Evaluation Setup

Step 3: Running Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages