Releases · ksanyok/TextHumanize

02 Mar 00:14

ksanyok

v0.25.0

aa95633

v0.25.0 Latest

Latest

What's Changed

Bug Fixes

CRITICAL: Fixed naturalizer.py regex crash for RU/UK text (~50 patterns with non-capturing groups + backreferences). The entire naturalization stage was silently skipped.
Added thread-safety locks to _ai_cache and _AI_WORDS for multi-threaded usage.
Added division-by-zero guards in detector metric calculations.

Cleanup

Removed dead module tokenizer.py (replaced by sentence_split.py).
Removed 14 one-off diagnostic scripts, 4 outdated competitive analysis docs, debug artifacts.
Synced PHP and JS package versions to 0.25.0.

Documentation

Fixed 17-stage to 20-stage across all 15+ documentation files.
Corrected test counts, LOC claims, speed benchmarks for consistency.
Fixed CHANGELOG date chronology.

CI

Raised per-test timeout from 120s to 300s to prevent false failures on slow CI runners.

Assets 2

01 Mar 00:32

ksanyok

v0.24.0

df23965

v0.24.0 — Deep Humanization for EN/RU/UK

v0.24.0: deep humanization improvements for EN/RU/UK

Neural detector:
- Per-language feature normalization (RU/UK char_entropy baseline 4.8-4.9 vs 4.3 EN)
- Expanded RU/UK conjunctions, transitions, AI word sets for MLP features 27/33/34

Naturalizer:
- Transition-phrase deletion (22 EN / 23 RU / 23 UK patterns)
- Em-dash injection (_comma_to_dash + _insert_dash_aside)
- Aggressive burstiness (threshold 25→16-20, fragment insertion strategy)
- Light perplexity boost (rhetorical questions for formal profiles)
- Paragraph splitting (5+ sentence paragraphs)
- +30 EN word simplification entries

Pipeline:
- Intensity cap raised 70→85, multipliers 1.15→1.20/1.1→1.15
- Stage 13a: final entropy re-injection post-grammar/coherence

Results (local backend, 3-sentence AI text, intensity=60):
  EN: 0.920 → 0.372 (human)
  RU: 0.880 → 0.390 (human)
  UK: 0.840 → 0.351 (human)

All 1984 tests pass.

Assets 4

28 Feb 22:04

ksanyok

v0.23.0

af319f8

v0.23.0 - OSS LLM Backend, PyPI Publication

What's New

Backend Parameter

New backend parameter: local (default), oss, openai, auto
OSS backend: Free AI humanization via amd/gpt-oss-120b-chatbot on HuggingFace Spaces
OpenAI backend: Optional paid backend using GPT-4o-mini
Auto mode: Tries OSS then OpenAI then local fallback

Install

pip install texthumanize==0.23.0

Usage

from texthumanize import humanize
result = humanize('AI text', backend='oss')

Full Changelog: v0.15.0...v0.23.0

Assets 4

26 Feb 20:31

ksanyok

v0.15.0

cfc7136

v0.15.0 — Full Audit Closure: 9 New Modules

What's New

9 New Core Modules

ai_backend — Three-tier AI backend: OpenAI API → OSS Gradio model (rate-limited) → built-in rules. New humanize_ai() function.
pos_tagger — Rule-based POS tagger for EN (500+ exceptions), RU/UK (200+), DE (300+). Universal tagset.
cjk_segmenter — Chinese BiMM (2504 entries), Japanese character-type, Korean space+particle segmentation.
syntax_rewriter — 8 sentence-level transforms (active↔passive, clause inversion, enumeration reorder, adverb migration). 150+ irregular verbs.
statistical_detector — 35-feature ML classifier for AI text detection. Integrated into detect_ai() with 60/40 weighted merge.
word_lm — Word-level unigram/bigram language model for 14 languages. Perplexity, burstiness, naturalness scoring.
collocation_engine — PMI-based collocation scoring for context-aware synonym selection. EN ~130, RU ~30, DE ~20 collocations.
fingerprint_randomizer — Anti-fingerprint diversification for output variety.
benchmark_suite — 6-dimension automated quality benchmarking.

Pipeline & Detection

Pipeline expanded to 17 stages (added syntax rewriting + anti-fingerprint diversification)
detect_ai() now returns combined_score (statistical + heuristic)
Fixed NO-OP _reduce_adjacent_repeats() — now actually removes repetitions

Tests

1,696 tests — 92 new, all passing (100% pass rate)

Assets 2

26 Feb 17:22

ksanyok

v0.14.0

7314c37

v0.14.0

v0.14.0 -- Reliability, Analysis Tools & New APIs

New API Functions

humanize_sentences() -- per-sentence AI scoring with graduated intensity; only rewrites sentences above a configurable AI probability threshold
humanize_variants() -- generates 1-10 humanization variants with different random seeds, sorted by quality
humanize_stream() -- generator that yields humanized text chunk-by-chunk with progress tracking

New Analysis Modules (zero-dependency, offline)

perplexity_v2 -- character-level trigram cross-entropy model with cross_entropy() and perplexity_score() returning naturalness score (0-100) and verdict
dict_trainer -- corpus analysis for custom dictionary building with train_from_corpus() and export_custom_dict()
plagiarism -- offline originality detection via n-gram fingerprinting with check_originality() and compare_originality()

Pipeline Improvements

Error isolation -- each processing stage wrapped in _safe_stage() with try/except; failing stages are skipped gracefully instead of crashing the pipeline
Partial rollback -- pipeline records checkpoints after each stage; on validation failure, rolls back stage-by-stage to find the last valid state
Pipeline profiling -- stage_timings dict and total_time included in metrics_after for performance analysis

Bug Fixes & Code Quality

Fixed adversarial_calibrate intensity parameter (float 0-1 changed to int 0-100 to match API)
Added input sanitization: TypeError for non-str, ValueError for >500K chars, early return for empty text
Thread-safe lazy loading with double-checked locking on all module loaders
Instance-level plugins preventing cross-instance interference
Fixed humanize_sentences crash (detect_ai_sentences returns list, not dict)

Tests

1,604 tests -- up from 1,560 (44 new tests for all v0.14.0 features)
100% pass rate

Assets 2

26 Feb 16:31

ksanyok

v0.13.0

e2b2988

v0.13.0 — 16-Stage Pipeline, Grammar & Tone & Readability & Coherence

TextHumanize v0.13.0

4 new pipeline stages (12 to 16):

Tone harmonization — match text tone to profile (academic/blog/seo/casual)
Readability optimization — split complex sentences, join short ones
Grammar correction — fix doubled words, spacing, typos (9 languages)
Coherence repair — transitions between paragraphs, diversify openings

Dictionary expansion (~3,600 new entries):

EN: +475 | RU: +430 | UK: +337
DE/ES/FR/IT/PL/PT: ~235 each
AR/ZH/JA/KO/TR: ~205 each
Total: ~13,800 entries across 14 languages

Tests: 1,560 (all passing)

Full changelog: https://github.com/ksanyok/TextHumanize/blob/main/CHANGELOG.md

Assets 2

26 Feb 14:54

ksanyok

v0.12.0

121d079

v0.12.0 — 14 Languages, Placeholder Safety, Watermark Pipeline

What's New

5 New Languages (14 total)

Arabic (ar) — 81 bureaucratic, 80 synonyms, 49 AI connectors, 47 abbreviations
Chinese Simplified (zh) — 80 bureaucratic, 80 synonyms, 36 AI connectors
Japanese (ja) — 60+ per category, keigo to casual register replacements
Korean (ko) — 60+ per category, honorific to casual register
Turkish (tr) — 60+ per category, Ottoman to modern Turkish

Critical Bug Fixes

Placeholder safety — all 6 processing modules now skip placeholder tokens; no more leaked placeholders in output
3-pass restore() — exact match, case-insensitive, orphan cleanup
HTML block protection — ul, ol, table, pre, blockquote preserved as single segments
Bare domain protection — site.com.ua, portal.kh.ua, example.co.uk etc.
Homoglyph fix — removed Cyrillic characters from special homoglyphs table (was corrupting all Cyrillic text)

Pipeline Improvements

Watermark cleaning — automatic first stage (12 stages total), removes zero-width chars, homoglyphs, invisible Unicode
Language detection — Arabic/CJK/Turkish script detection added

Tests

1,509 tests passed (54 new)

Assets 2

20 Feb 12:05

ksanyok

v0.11.0

8667229

v0.11.0 — 3x Dictionary Expansion + Composer Fix

What's New

Massive Dictionary Expansion (3x total)

All 9 language dictionaries expanded from 2,281 to 6,881 entries (3.0x growth):

Language	Before	After	Growth
English	257	1,391	5.4x
Russian	291	956	3.3x
Ukrainian	252	780	3.1x
German	235	724	3.1x
French	263	599	2.3x
Spanish	255	613	2.4x
Italian	244	616	2.5x
Polish	244	617	2.5x
Portuguese	240	585	2.4x

All 9 categories expanded: synonyms, bureaucratic words/phrases, AI connectors, sentence starters, colloquial markers, perplexity boosters, split conjunctions, abbreviations.

Bug Fixes

Composer package name — root composer.json had incorrect name ksanyok/texthumanize (no hyphen). Fixed to ksanyok/text-humanize. Also changed type from project to library with proper Packagist metadata.
TOC dots preservation — table-of-contents leader dots (...........) no longer collapse into ellipsis.

Install

# Python
pip install texthumanize

# PHP
composer require ksanyok/text-humanize

1,455 tests passing.

Assets 2

20 Feb 09:36

ksanyok

v0.10.0

80bd50d

v0.10.0 — Grammar, Uniqueness, Health Score, Semantic & Sentence Readability

What's New in v0.10.0

5 New Analysis Modules (all offline, no ML/API)

Module	Function	Description
Grammar Checker	`check_grammar()` / `fix_grammar()`	Rule-based grammar checking for 9 languages
Uniqueness Score	`uniqueness_score()` / `compare_texts()`	N-gram fingerprinting uniqueness analysis
Content Health	`content_health()`	Composite quality: readability + grammar + uniqueness + AI + coherence
Semantic Similarity	`semantic_similarity()`	Measures semantic preservation between original and processed text
Sentence Readability	`sentence_readability()`	Per-sentence difficulty scoring (easy/medium/hard/very_hard)

Custom Dictionary API

result = humanize(text, custom_dict={
    "implement": "build",
    "utilize": ["use", "apply", "employ"],  # random pick
})

Massively Expanded Dictionaries

All 9 language dictionaries balanced (367-439 entries each):

FR: 281→397, ES: 275→388, IT: 272→379, PL: 257→368, PT: 256→367
EN/RU/UK: added perplexity_boosters

Stats

28 files changed, +2333 lines
1455 tests passing (82 new)
17 new public exports
Zero external dependencies

Assets 2

20 Feb 09:05

ksanyok

v0.9.0

621dba9

v0.9.0 — Kirchenbauer Watermark, HTML Diff, Quality Gate, Selective Humanization, Stylometric Anonymizer

What's New

Kirchenbauer Watermark Detector

Green-list z-test based on Kirchenbauer et al. 2023. Uses SHA-256 hash of previous token to partition vocabulary into green/red lists (γ=0.25), computes z-score and p-value. Flags AI watermark at z ≥ 4.0.

from texthumanize import detect_watermarks
report = detect_watermarks(text)
print(report.kirchenbauer_score, report.kirchenbauer_p_value)

HTML Diff Report

explain() now supports multiple output formats:

html = explain(result, fmt='html')      # self-contained HTML page
json_str = explain(result, fmt='json')  # RFC 6902 JSON Patch
diff = explain(result, fmt='diff')      # unified diff

Quality Gate

CLI + GitHub Action + pre-commit hook to check text for AI artifacts:

python -m texthumanize.quality_gate README.md docs/ --ai-threshold 25

Selective Humanization

Process only AI-flagged sentences, leaving human text untouched:

result = humanize(text, only_flagged=True)

Stylometric Anonymizer

Disguise authorship by transforming text toward a target style:

from texthumanize import anonymize_style
result = anonymize_style(text, target='blogger')

Stats

1,373 Python tests passing
40 new tests for v0.9.0 features
Ruff lint clean
22 files changed, 1,637 additions

Assets 2

Releases: ksanyok/TextHumanize

v0.25.0

What's Changed

Bug Fixes

Cleanup

Documentation

CI

Uh oh!

v0.24.0 — Deep Humanization for EN/RU/UK

Uh oh!

v0.23.0 - OSS LLM Backend, PyPI Publication

What's New

Backend Parameter

Install

Usage

Uh oh!

v0.15.0 — Full Audit Closure: 9 New Modules

What's New

9 New Core Modules

Pipeline & Detection

Tests

Uh oh!

v0.14.0

v0.14.0 -- Reliability, Analysis Tools & New APIs

New API Functions

New Analysis Modules (zero-dependency, offline)

Pipeline Improvements

Bug Fixes & Code Quality

Tests

Uh oh!

v0.13.0 — 16-Stage Pipeline, Grammar & Tone & Readability & Coherence

TextHumanize v0.13.0

4 new pipeline stages (12 to 16):

Dictionary expansion (~3,600 new entries):

Tests: 1,560 (all passing)

Uh oh!

v0.12.0 — 14 Languages, Placeholder Safety, Watermark Pipeline

What's New

5 New Languages (14 total)

Critical Bug Fixes

Pipeline Improvements

Tests

Uh oh!

v0.11.0 — 3x Dictionary Expansion + Composer Fix

What's New

Massive Dictionary Expansion (3x total)

Bug Fixes

Install

Uh oh!

v0.10.0 — Grammar, Uniqueness, Health Score, Semantic & Sentence Readability

What's New in v0.10.0

5 New Analysis Modules (all offline, no ML/API)

Custom Dictionary API

Massively Expanded Dictionaries

Stats

Uh oh!

v0.9.0 — Kirchenbauer Watermark, HTML Diff, Quality Gate, Selective Humanization, Stylometric Anonymizer

What's New

Kirchenbauer Watermark Detector

HTML Diff Report

Quality Gate

Selective Humanization

Stylometric Anonymizer

Stats

Uh oh!