Lean Fine-Tuning Toolkit

Production-ready pipeline for extracting Lean theorem/proof pairs, augmenting inputs, and fine-tuning GPT-2 with LoRA adapters.

Highlights

Fetches Mathlib4, Lean 4 stdlib, MiniF2F (optional LeanDojo) with caching and license annotations.
Robust Lean parser that captures theorem/lemma/example statements, proofs, imports, and metadata.
Input augmentation producing formal headers, math-form reductions, and configurable natural-language paraphrases.
Hugging Face DatasetDict builder with leakage-safe file hashing and GPT-2 compatible tokenization/masking.
LoRA fine-tuning loop using PEFT + Hugging Face Trainer with early stopping, mixed precision, and rich logging.
Evaluation script reporting perplexity/token F1 and saving qualitative proof generations.

Quickstart

Create environment (Python 3.10–3.12):
```
conda create -n leanft python=3.11
conda activate leanft
```
If pyarrow fails via pip, install it first with conda install -c conda-forge pyarrow.
Install project:
```
make setup
```
Run the end-to-end CPU-friendly pipeline on a small subset:
```
make fetch
make extract
make augment
make build
make train
```
Each command has overridable environment variables (see Makefile).

Workflow

scripts/fetch_data.py: clones Lean sources into data/raw, skipping ones already present.
scripts/extract_pairs.py: parses .lean files and writes data/pairs.jsonl.
scripts/augment_inputs.py: expands each theorem into multiple (input, target) variants stored in data/aug.jsonl.
scripts/build_hf_dataset.py: splits data with file-level hashing and saves a Hugging Face dataset to data/hf.
scripts/train_lora.py: loads configs/base.yaml, tokenizes for GPT-2, applies LoRA, and trains.
scripts/evaluate.py: loads saved adapters, reports perplexity/token-F1, and writes generations to samples.txt.

Config & Logging

configs/base.yaml captures LoRA hyperparameters, dataset paths, device map, and logging backends.
Enable Weights & Biases or TensorBoard by setting report_to (e.g., ["wandb"]) and use_wandb: true.
Mixed precision defaults to null for portability; set bf16 on Apple Silicon/Ampere or fp16 on CUDA.

Evaluation Outputs

checkpoints/best: best-performing adapter checkpoint (by perplexity).
checkpoints/samples.txt: proof generations for qualitative inspection.
Metrics (perplexity, token F1) printed via scripts/evaluate.py.

Testing

pytest covers Lean parsing edge cases and dataset masking behaviour (tests/).
Use make setup && pytest before pushing changes or running large jobs.

Licensing

See LICENSES.md for upstream licenses (Mathlib4, Lean stdlib, MiniF2F, LeanDojo).

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
data		data
scripts		scripts
src/leanft		src/leanft
tests		tests
.gitignore		.gitignore
GPU_TRAINING_GUIDE.md		GPU_TRAINING_GUIDE.md
LICENSES.md		LICENSES.md
Makefile		Makefile
README.md		README.md
extract_lean_data.py		extract_lean_data.py
gpt4_prompt_template.md		gpt4_prompt_template.md
lean_data_sources.md		lean_data_sources.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup_gpu.sh		setup_gpu.sh
test_gpt2_simple.py		test_gpt2_simple.py
test_lora_simple.py		test_lora_simple.py
test_modern_model.py		test_modern_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lean Fine-Tuning Toolkit

Highlights

Quickstart

Workflow

Config & Logging

Evaluation Outputs

Testing

Licensing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lean Fine-Tuning Toolkit

Highlights

Quickstart

Workflow

Config & Logging

Evaluation Outputs

Testing

Licensing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages