LogiTest is a benchmarking tool that evaluates the ability of large language models (LLMs) to simplify boolean logic expressions. It generates random boolean expressions, computes a reference solution using SymPy's symbolic logic engine, queries an LLM for its simplified form, and compares the two results to produce an accuracy score. Results are appended to a csv file.
- Python 3.14+
- An API key for the configured LLM provider
- An API key for the configured LLM provider
$ pip install .Or using uv:
$ uv sync- Set the LLM model via the
LOGITEST_LLM_MODELenvironment variable (default:anthropic/claude-sonnet-4-5) - Set the API key via the
LOGITEST_LLM_MODEL_API_KEYenvironment variable
export LOGITEST_LLM_MODEL=anthropic/claude-sonnet-4-5
export LOGITEST_LLM_MODEL_API_KEY=your-api-key-hereThe project uses sympy for reference solutions and the llm library for
model interaction. The core flow is:
- Boolean expressions are generated using
sympysymbols and operators - Expressions are converted to Python syntax and passed to the LLM
- The LLM response is parsed back into SymPy's DNF form
- The LLM solution is compared against
simplify_logicoutput
Run LogiTest with one randomly generated expression:
$ python logitest.py$ python logitest.py
$ python logitest.py --model anthropic/claude-opus-4-6
$ python logitest.py --include-default
$ python logitest.py --num-exprs 2
$ python logitest.py --num-exprs 2 --num-tries 2
$ python logitest.py --num-exprs 2 --num-tries 2 --extra-effort
$ python logitest.py --num-exprs 2 --num-tries 2 --extra-effort --atoms 'a,b,c,d'
$ python logitest.py --num-exprs 2 --seed 1024The output is a grid table showing each expression, the reference solution, the LLM solution, metadata, and a pass/fail result, followed by an overall accuracy score:
+-+----------------------------+------------------+----------------------+------+
|#|expression |ref_solution |llm_solution |result|
+=+============================+==================+======================+======+
|1|(b & ~a) | (a & b & c) |b & (c | ~a) |(b & ~a) | (a & b & c)| 0|
+-+----------------------------+------------------+----------------------+------+
|2|(a & c) | (b & c) | (b & ~a)|(a & c) | (b & ~a)|b | (a & c) | 0|
+-+----------------------------+------------------+----------------------+------+
accuracy: 0.0 | avg_tries: 1.0 | correct: 0/2