You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+26Lines changed: 26 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,6 +3,32 @@
3
3
All notable changes to this project are documented in this file.
4
4
Format based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
5
5
6
+
## [0.15.0] - 2025-06-28
7
+
8
+
### Added
9
+
-**9 new core modules** — full audit gap closure (100% of C1-C4, H1-H7, M1-M5, N1-N8 items):
10
+
-`ai_backend` — Three-tier AI backend: OpenAI API → OSS Gradio model (rate-limited, circuit-breaker) → built-in rules. New `humanize_ai()` function in core.
11
+
-`pos_tagger` — Rule-based POS tagger for EN (500+ exceptions), RU/UK (200+ each), DE (300+). Universal tagset with context disambiguation.
12
+
-`cjk_segmenter` — Chinese BiMM (2504-entry dict), Japanese character-type, Korean space+particle segmentation. Functions: `segment_cjk()`, `is_cjk_text()`, `detect_cjk_lang()`.
-`statistical_detector` — 35-feature AI text classifier with logistic regression. EN 85+ AI markers, RU 38+ markers. Integrated into `detect_ai()` with 60/40 weighted merge (heuristic/statistical).
15
+
-`word_lm` — Word-level unigram/bigram language model replacing character-trigram perplexity. 14 language frequency tables. Perplexity, burstiness, and naturalness scoring.
16
+
-`collocation_engine` — PMI-based collocation scoring for context-aware synonym ranking. EN ~130, RU ~30, DE ~20, FR ~15, ES ~12 collocations.
17
+
-`fingerprint_randomizer` — Anti-fingerprint diversification: plan randomization, synonym pool variation, whitespace jitter, paragraph intensity variation. Integrated as pipeline stage 13b.
-**Pipeline expanded to 17 stages** — added `syntax_rewriting` (stage 7b) and anti-fingerprint diversification (stage 13b).
20
+
-**92 new tests** for all v0.15.0 modules — AI backend, POS tagger, CJK segmenter, syntax rewriter, statistical detector, word LM, collocation engine, fingerprint randomizer, benchmark suite, plus integration tests.
21
+
22
+
### Fixed
23
+
-**NO-OP `_reduce_adjacent_repeats()`** — was finding repeated words but doing `pass`. Now correctly removes second occurrences within a sliding window of 8 words, with article removal support.
24
+
-**Paragraph whitespace preservation** — `_reduce_adjacent_repeats()` now uses `re.split(r'(\s+)')` to tokenize while preserving `\n\n` paragraph breaks.
0 commit comments