Your AI wrote a confident report. Is it actually true?
EFC-Plugin is a Claude Code plugin that turns your agent into a rigorous fact-checker — catching hallucinated numbers, fabricated data points, and exaggerated claims before they reach your reader.
🪞 This repo eats its own dog food: it is fact-checked against itself before every release. See
FACTCHECK.md.
LLM agents are great at producing reports that look authoritative. The trouble is the parts that aren't — and they fail in predictable, repeatable ways:
| Failure mode | What it looks like |
|---|---|
| 🔢 Unit / scale errors | $5.3B that should be $530M — a dropped conversion |
| 📈 Fabricated interpolation | A 6-point chart where only 2 points were ever found |
| 🔀 Source conflation | "GMV" reported as "revenue"; "trade" as "exports" |
| 🕰️ Stale data as current | 2023 figures presented as 2025 actuals |
| 🎭 Attribution laundering | A blog cited as if it were a regulatory filing |
These aren't random slips — they're systematic patterns that show up whenever an LLM does research at scale. This plugin gives your agent a structured protocol to hunt them down, plus small scripts that automate the tedious first steps.
Point it at a flawed report and it produces a verdict-by-verdict audit:
> fact-check examples/sample-report.md
❌ Errors Found
| Claim | Reported | Actual | Mode |
|-------------------------------|-----------|---------------------|------------------|
| FY2024 revenue | $4,200B | $4.2B (per table) | Unit/scale error |
| App "generated $1.2B revenue" | revenue | marketplace GMV | Source conflation|
⚠️ Unverifiable
| 2019–2023 revenue series | only FY2024 had a source → likely interpolated |
🔗 Broken links: 2 of 3 source URLs don't resolve
Overall reliability: Low — do not publish without correction
Try it yourself: examples/sample-report.md is a deliberately flawed (fictional) report with one of each failure mode, and examples/expected-fact-check.md shows the target output.
It's a standard Claude Code plugin — two lines to install:
/plugin marketplace add Nlai741533/EFC-Plugin
/plugin install fact-check@everything-fact-checked
Wait, why three different names?
This project wears three hats, so it answers to three names:
| Name | What it is | Where you see it |
|---|---|---|
EFC-Plugin |
the GitHub repository | clone URL, marketplace add |
everything-fact-checked |
the marketplace + the PyPI package | install …@everything-fact-checked, pip install |
fact-check |
the plugin/skill itself | install fact-check@…, the /fact-check prompt |
So /plugin install fact-check@everything-fact-checked reads as "install the
fact-check plugin from the everything-fact-checked marketplace."
Or kick the tires for a single session, no install:
git clone https://github.com/Nlai741533/EFC-Plugin
claude --plugin-dir ./EFC-PluginVerified against Claude Code 2.1.143. There is no
claude skill addcommand — use the plugin commands above. (--plugin-urlexists but expects a packaged.zipURL, not a repo page, so it is intentionally not shown here.)
Then just ask, in plain language:
fact-check this report
verify the numbers in the market analysis
audit the data in this deliverable
The skill runs a structured 6-step workflow:
- Inventory every specific, checkable claim
- Triage by risk (P0 critical → P3 cosmetic)
- Verify P0/P1 claims against primary sources
- Cross-check charts and tables for internal consistency
- Audit the source list (do the links even resolve?)
- Report — every verdict backed by a record in a standard evidence format
It also treats all source content as untrusted data, not instructions — so a document can't sweet-talk the fact-checker into stamping itself "verified."
- ✅ A disciplined operating procedure — triage, primary-source preference, chart/table tracing, marketing-claim labeling, a standard evidence format.
- ✅ A CLI (
efc) for local use and CI — claim extraction, link checking, evidence validation, source-content verification, and full audit. - ✅ A GitHub Action that auto-checks markdown reports in PRs.
- ❌ Not a push-button oracle. Deciding whether a source truly supports a claim is a judgment call — the agent still opens the primary sources. The tools tell you what to check, not whether it's true.
Install and use:
pip install .
# or: pipx install .
efc version # show version
efc extract report.md # inventory claims
efc extract report.md --json # JSON output
efc links report.md # check source URLs
efc links --no-network report.md # list URLs only
efc evidence evidence.json # validate evidence records
efc verify evidence.json # verify source content matches claims
efc audit --no-network report.md # full audit (claims + links + summary)
efc audit --json report.md # machine-readable audit for CIExit codes: 0 = clean, 1 = problems found, 2 = usage/IO error.
No third-party dependencies — standard library only. Tested on Python 3.11–3.13.
Every verdict can be recorded in a machine-checkable evidence format; efc evidence enforces the schema and the cross-field rules (e.g. a verified verdict must cite a well-formed http/https URL and, for P0/P1 claims, a primary or secondary source). See examples/evidence-sample.json.
efc verify goes beyond link-checking: it fetches cited URLs, extracts visible text, and checks whether the claimed figure or key terms actually appear in the source.
efc verify evidence.json # verify all records
efc verify --claim C001 evidence.json # verify one claim
efc verify --json evidence.json # JSON outputVerdicts: found ✅ | not_found ❌ | ambiguous skipped ⏭️ | fetch_failed 🔌
foundis necessary, not sufficient. This is a term-overlap heuristic: afoundmeans the page contains the claim's key terms (numbers, years, names), not that the page actually supports the claim. A long article can contain "$4.2B" and "2024" in unrelated sentences. Treatfoundas "worth a human read,"not_found/ambiguousas "investigate or flag" — never as a final verdict on truth.
Fact-check markdown reports automatically in PRs:
- uses: Nlai741533/EFC-Plugin@v0.2.4
with:
check-links: 'true'
fail-on-broken-links: 'true'
link-timeout: '10'The action extracts claims from .md files (excluding docs like README, CHANGELOG, FACTCHECK), checks source links, and posts results to the PR summary.
This is designed to be a foundation, not a finished product. PRs and forks are very welcome. Some directions that would make it more powerful:
- More extractors — the inventory already covers figures, percentages, dates, and superlatives; add named entities, quotes, and currency-conversion pairs.
- PDF & table parsing — extract claims straight from filings and spreadsheets.
- Exchange-rate sanity checks — flag conversions made without a stated rate.
- Domain packs — tuned source hierarchies for finance, science, law, medicine, etc.
- Report scoring — produce a reliability score based on source coverage, P0/P1 verification rate, and broken link rate.
New to the repo? Good first issues: add a new claim type to extract_claims.py (with a test), or add a fixture for a failure mode that isn't covered yet.
python3 -m unittest discover -s tests -vCI validates the plugin/marketplace manifests, runs the test suite, and smoke-tests the scripts on every push. Please add a test with any new behavior — and run the scripts on your own change, in the spirit of the project. 🙂
Before any doc change ships, the repo is fact-checked against itself: install commands are run against claude --help/--version, links are checked, and claims are verified against primary sources. The result lives in FACTCHECK.md. (Round 1 shipped a fabricated claude skill add command — this routine exists so that never happens again.)
MIT — use it, fork it, ship it. If it saves you from publishing a wrong number, ⭐ the repo so others can find it too.