TrustLens

Audit ML models beyond accuracy — calibration, fairness, latent health, and deployment verdicts.

Quickstart · How It Works · Demo Video · Docs · Project Showcase

Your model has 92% accuracy. It's still not safe for deployment.

Accuracy measures what went right. TrustLens measures what can go wrong — in production, on subgroups, and at high confidence.

Why TrustLens

Standard evaluation stops at accuracy. Silent failures happen when:

A model is overconfident — "90% sure" but right only 60% of the time
Performance collapses on subgroups — gender, age, or region hidden inside a good aggregate score
The model is confidently wrong — high-confidence errors that indicate systemic risk
Latent representations overlap — classes bleed together where the model can't tell them apart

TrustLens surfaces all four with a single audit, and outputs a machine-readable deployment verdict.

Supported Frameworks

TrustLens uses a Prediction Resolver Architecture to automatically handle different ML frameworks:

scikit-learn — Full support for all ClassifierMixin estimators.
XGBoost — Native support for XGBClassifier and raw Booster objects.
LightGBM — Native support for LGBMClassifier and raw Booster objects.
CatBoost — Native support for CatBoostClassifier.
Planned — PyTorch, TensorFlow/Keras.

TrustLens automatically detects your model's framework. You don't need to change your code when switching from sklearn to XGBoost.

Documentation

Explore the full TrustLens documentation:

Quickstart

pip install trustlens
# Extended visualization support
pip install trustlens[full]

Run a one-line audit to see why 94% accuracy isn't the full story:

from trustlens import quick_analyze

quick_analyze(dataset="breast_cancer")

TRUST SCORE: 68/100 [D]
Assessment : Low Trust — Blocked by high diagnostic risk

  Base Score        : 76
  Penalties Applied : -7.7 (Failure Risk)
  Final Score       : 68

→ Model shows high failure risk and is NOT ready for deployment.

How It Works

TrustLens runs four diagnostic modules and combines them into a single Trust Score (0–100) with a CI/CD-ready deployment verdict.

Module	What It Catches
Calibration	Confidence vs. correctness mismatch, overconfidence, ECE
Fairness	Subgroup performance gaps, equalized-odds violations
Representation	Latent space health, class separation, overlap detection
Decision Engine	Composite Trust Score + `Ready` / `Blocked` verdict

Scientific Validation

TrustLens is more than a visualization tool—it is a statistically grounded diagnostic framework. We have systematically validated its behavior across 6 model architectures and multiple data corruption scenarios (noise, imbalance, bias).

Key Finding: TrustLens empirically decouples Accuracy from Trust, flagging high-accuracy models that exhibit high reliability risks (the "Overconfidence Zone").

View the Model Zoo Benchmark

Full Audit

Automatic Detection (scikit-learn / XGBoost / LightGBM / CatBoost)

from trustlens import analyze

# Works the same way for XGBClassifier, LGBMClassifier, or CatBoostClassifier
from xgboost import XGBClassifier
# from lightgbm import LGBMClassifier
# from catboost import CatBoostClassifier

model = XGBClassifier().fit(X_train, y_train)

# TrustLens automatically detects the framework and resolves predictions
report = analyze(
    model=model,
    X=X_test,
    y_true=y_test,
    sensitive_features={"gender": gender_test}
)

report.show()

Manual Prediction Override

For external inference systems or unsupported frameworks, you can pass predictions directly:

report = analyze(
    model=None, # optional when passing y_pred/y_prob
    X=X_test,
    y_true=y_test,
    y_pred=external_preds,
    y_prob=external_probs
)

Audit Metadata & Provenance

Every report tracks its own backend provenance for auditability:

print(report.metadata["framework"])  # "xgboost" | "lightgbm" | "catboost" | "sklearn"
print(report.metadata["backend"])    # {'resolver': 'xgboost', 'framework_version': '2.0.3', ...}

Save & Export

# Save as a unified JSON artifact (best for experiment trackers)
report.save("report.json")

# Save as a full directory bundle (best for human review)
report.save("trust_report/")

Output artifacts (Directory Bundle)

trust_report/
├── trust_score.json    ← deployment verdict & composite score
├── report.json         ← raw diagnostic metrics
├── metadata.json       ← environment, version, backend provenance
├── report.txt          ← human-readable summary
└── visuals/            ← per-module diagnostic plots (PNG)

CI/CD gating

Gate model promotion on trust_score.json — no custom scripting needed:

{
  "score": 68,
  "grade": "D",
  "verdict": "Low Trust — Blocked by high failure risk",
  "is_blocked": true
}

Diagnostics in Practice

Calibration _{Does confidence align with correctness?}	Fairness & Bias _{Are subgroups treated equally?}
Latent Space Health _{Is class separation clean?}	Deployment Verdict _{Is this model safe to ship?}

Demo

15-minute walkthrough: diagnostics, trust scoring, fairness analysis, and visual dashboards.

Want a deeper look at the architecture and design decisions? → Interactive Project Showcase

Run the Full Demo

python demo.py

Generates multi-model comparisons, fairness deep-dives, latent space projections, JSON audits, and visual dashboards across all modules.

Contributing

All contributions welcome — new metrics, diagnostic plugins, and visualizations.

→ Contributing Guide · Open an Issue · Docs

Citation

@software{trustlens2026,
  author = {Shahid Ul Islam},
  title  = {TrustLens: Audit ML models beyond accuracy},
  year   = {2026},
  url    = {https://github.com/Khanz9664/TrustLens}
}

Built by Shahid Ul Islam · Portfolio · LinkedIn

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
.github		.github
assets		assets
docs		docs
examples		examples
tests		tests
trustlens		trustlens
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
demo.py		demo.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TrustLens

Audit ML models beyond accuracy — calibration, fairness, latent health, and deployment verdicts.

Why TrustLens

Supported Frameworks

Documentation

Quickstart

How It Works

Scientific Validation

Full Audit

Automatic Detection (scikit-learn / XGBoost / LightGBM / CatBoost)

Manual Prediction Override

Audit Metadata & Provenance

Save & Export

Output artifacts (Directory Bundle)

CI/CD gating

Diagnostics in Practice

Demo

Run the Full Demo

Contributing

Citation

About

Uh oh!

Releases 5

Sponsor this project

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

TrustLens

Audit ML models beyond accuracy — calibration, fairness, latent health, and deployment verdicts.

Why TrustLens

Supported Frameworks

Documentation

Quickstart

How It Works

Scientific Validation

Full Audit

Automatic Detection (scikit-learn / XGBoost / LightGBM / CatBoost)

Manual Prediction Override

Audit Metadata & Provenance

Save & Export

Output artifacts (Directory Bundle)

CI/CD gating

Diagnostics in Practice

Demo

Run the Full Demo

Contributing

Citation

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 5

Sponsor this project

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages