The public data and analysis repository for the paper:
Not Hallucination but Granularity: Error Taxonomy and Quality Audit of LLM-Based Legal Information Extraction
Diego Sens (sens.legal, OAB/PR)
Language: pt-BR | English
| Field | Value |
|---|---|
| Scope | End-to-end expert audit of a production legal extraction pipeline |
| Audit sample | 100 Brazilian court decisions |
| Audited items | 1,042 |
| Courts | STJ, TJPR, TJSP, TRF4 |
| Core result | 96.0% precision with zero hallucinations in production models |
| Dominant error class | Granularity mismatch |
- production extraction reached 96.0% precision
- zero hallucinations were observed in the audited production sample
- granularity mismatch accounted for 31 of 42 errors (3.0% of all items)
- LLM-as-judge agreement varied sharply by model, with Cohen's kappa ranging from 0.23 to 0.74
| Asset | Description |
|---|---|
data/sample_ids.json |
Identifiers for the 100 audited decisions |
data/error_taxonomy.json |
Seven-type error taxonomy |
data/audit/ |
Expert audit files currently included in the repository |
scripts/paper_stats.py |
Statistics recomputation script |
scripts/phase0_results.md |
Supporting notes from the study workflow |
LICENSE |
Repository license (CC BY 4.0) |
| Asset | Status | Notes |
|---|---|---|
| Sample IDs (100 decisions) | Available | Tribunal + case number |
| Error taxonomy | Available | Seven-type classification |
Expert audit files in data/audit/ |
Available | Current published audit material |
| Analysis script | Available | See note below on supplementary experiment inputs |
| Decision texts | Not included | Public judicial records |
| Extraction prompts | Not included | Proprietary |
The main recomputation script is:
python scripts/paper_stats.pyImportant note: the script expects supplementary experiment files under
scripts/exp3_results/ and scripts/controlled_extraction/. If those assets
are not present in the local checkout, full recomputation will not run end to
end.
| Code | Error type | Definition |
|---|---|---|
| HAL | Hallucination | Extracted concept does not exist in the decision text |
| OMI | Omission | Concept exists but extraction is incomplete |
| GRA | Granularity mismatch | Concept at the wrong level of specificity |
| MIS | Misattribution | Attributed to the wrong party or court |
| ANC | Anchoring failure | Linked to the wrong legal provision |
| DUP | Duplication | Same concept extracted multiple times |
| TYP | Type error | Content placed in the wrong field |
@article{sens2026granularity,
author = {Diego Sens},
title = {Not Hallucination but Granularity: Error Taxonomy and Quality Audit of {LLM}-Based Legal Information Extraction},
year = {2026},
note = {Preprint}
}This repository is released under CC BY 4.0.