LogiTest

Introduction

LogiTest is a benchmarking tool that evaluates the ability of large language models (LLMs) to simplify boolean logic expressions. It generates random boolean expressions, computes a reference solution using SymPy's symbolic logic engine, queries an LLM for its simplified form, and compares the two results to produce an accuracy score. Results are appended to a csv file.

Requirements

Development Environment

Python 3.14+
An API key for the configured LLM provider

Production Environment

An API key for the configured LLM provider

Installation

$ pip install .

Or using uv:

$ uv sync

Configuration

Set the LLM model via the LOGITEST_LLM_MODEL environment variable (default: anthropic/claude-sonnet-4-5)
Set the API key via the LOGITEST_LLM_MODEL_API_KEY environment variable

export LOGITEST_LLM_MODEL=anthropic/claude-sonnet-4-5
export LOGITEST_LLM_MODEL_API_KEY=your-api-key-here

Development Workflow

The project uses sympy for reference solutions and the llm library for model interaction. The core flow is:

Boolean expressions are generated using sympy symbols and operators
Expressions are converted to Python syntax and passed to the LLM
The LLM response is parsed back into SymPy's DNF form
The LLM solution is compared against simplify_logic output

Usage

Run LogiTest with one randomly generated expression:

$ python logitest.py

Examples

$ python logitest.py
$ python logitest.py --model anthropic/claude-opus-4-6
$ python logitest.py --include-default
$ python logitest.py --num-exprs 2
$ python logitest.py --num-exprs 2 --num-tries 2
$ python logitest.py --num-exprs 2 --num-tries 2 --extra-effort
$ python logitest.py --num-exprs 2 --num-tries 2 --extra-effort --atoms 'a,b,c,d'
$ python logitest.py --num-exprs 2 --seed 1024

The output is a grid table showing each expression, the reference solution, the LLM solution, metadata, and a pass/fail result, followed by an overall accuracy score:

+-+----------------------------+------------------+----------------------+------+
|#|expression                  |ref_solution      |llm_solution          |result|
+=+============================+==================+======================+======+
|1|(b & ~a) | (a & b & c)      |b & (c | ~a)      |(b & ~a) | (a & b & c)|     0|
+-+----------------------------+------------------+----------------------+------+
|2|(a & c) | (b & c) | (b & ~a)|(a & c) | (b & ~a)|b | (a & c)           |     0|
+-+----------------------------+------------------+----------------------+------+
accuracy: 0.0 | avg_tries: 1.0 | correct: 0/2

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
data		data
logs		logs
tests		tests
tools		tools
.gitignore		.gitignore
.python-version		.python-version
logitest.py		logitest.py
pyproject.toml		pyproject.toml
readme.md		readme.md
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LogiTest

Introduction

Requirements

Development Environment

Production Environment

Installation

Configuration

Development Workflow

Usage

Examples

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LogiTest

Introduction

Requirements

Development Environment

Production Environment

Installation

Configuration

Development Workflow

Usage

Examples

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages