Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
146 changes: 115 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,27 +16,38 @@ pytest -q

<table width="100%">
<tr>
<td colspan="12" width="100.0%" valign="top">
<td width="100%" valign="top">
<div><span><b>00 · GNOSIS-MORPH-BENCH</b> · ANCIENT SCRIPT RESEARCH UTILITY</span> <span>RESEARCH-READY · LIVE INDUS BLOCKED</span></div>
<h1>Toolkitting for ancient script <span>research.</span></h1>
<p>A shared scoring rig for ancient-script methods &middot; PyPI <em>gnosis-morph-bench</em> v0.1.0 &middot; github.com/Zer0pa/Morph-Bench</p>
<p>Every researcher studying ancient-script morphology rebuilds the same machinery from scratch: scoring routes, permutation nulls, stability sweeps. The result lives in one paper, then the next lab starts again from nothing. <em>Morph-Bench</em> is the shared rig that ends that cycle — it scores routes, preserves null results, runs stability checks, and records replay so the next study begins from a measured floor. Today <strong>37 tests pass</strong> and runs are byte-identical across macOS and Linux. Live Indus Phase 4 rerun is <strong>blocked</strong> on Phase 3c manifest access.</p>
</td>
</tr>
</table>

<table width="100%">
<tr>
<td colspan="12" width="100.0%" align="center" valign="top">
<td width="100%" valign="top">
<figure>
<div><img src="docs/assets/product-page-mechanics.gif" alt="Gnosis-Morph-Bench approved scientific square mechanics diagram showing morphology benchmark-grid mechanics."></div>
<figcaption><b>Scope:</b> morphology benchmark rig. Scoring, permutation nulls, stability sweeps, and replay pass; live Indus Phase 4 is blocked.</figcaption>
</figure>
</td>
</tr>
</table>

<table width="100%">
<tr>
<td colspan="4" width="33.33%" valign="top">
<td width="100%" valign="top">
<div><b>01 · THE GAP</b> <span>NO MEASURE, NO BASELINE</span></div>
<h2>Every ancient-script researcher rebuilds the same baseline from scratch. The measure is never kept.</h2>
</td>
<td colspan="8" width="66.67%" valign="top">
</tr>
</table>

<table width="100%">
<tr>
<td width="100%" valign="top">
<div><b>02 · MARKETS</b> <span>ADJACENT FORECASTS</span></div>
<div>
<div>
Expand All @@ -50,29 +61,38 @@ pytest -q
<div>Every field on this list has ancient-script work underway. None of it starts from a measured baseline.</div>
</td>
</tr>
</table>

<table width="100%">
<tr>
<td colspan="6" width="50.0%" valign="top">
<td width="50%" valign="top">
<div><b>03 · VALUE</b></div>
<div>$8.1<span>B</span></div>
<div>Cultural-heritage digitization in 2030; research infrastructure for review and reuse, not commercial sale.</div>
</td>
<td colspan="6" width="50.0%" valign="top">
<td width="50%" valign="top">
<div><b>04 · INSIGHT</b></div>
<h2>Measure the method once. <span>The next researcher starts from here.</span></h2>
</td>
</tr>
</table>

<table width="100%">
<tr>
<td colspan="6" width="50.0%" valign="top">
<td width="50%" valign="top">
<div><b>05.1 · CURRENT TECH</b> <span>REBUILT EVERY TIME</span></div>
<p>Ancient-script research has no standard scoring rig. Each lab writes its own metric, picks its own null, records its own result — then the work moves on. The next researcher starts again from zero.</p>
</td>
<td colspan="6" width="50.0%" valign="top">
<td width="50%" valign="top">
<div><b>05.2 · OUR TECH</b> <span>THE BASELINE, KEPT</span></div>
<p><em>Morph-Bench</em> installs from PyPI and gives a researcher a working starting point: a route scorer, a permutation null, a five-mode stability battery, and replayable records that are byte-identical across macOS and Linux Python 3.11. <strong>37 tests</strong> pass on a fresh clone. <strong>9 of 9</strong> adapter clauses hold.</p>
</td>
</tr>
</table>

<table width="100%">
<tr>
<td colspan="12" width="100.0%" valign="top">
<td width="100%" valign="top">
<div><b>05.3 · BENCHMARKS</b> <span>PUBLIC RECEIPTS</span></div>
<div>
<div>
Expand All @@ -90,14 +110,15 @@ pytest -q
<div><b>Status:</b> working today. Live Indus Phase 4 rerun blocked on Phase 3c manifest.</div>
</td>
</tr>
</table>

<table width="100%">
<tr>
<td colspan="12" width="100.0%" valign="top">
<td width="34%" valign="top">
<div><b>06 · MEASUREMENT</b> <span>METHOD SCORING + REPLAY</span></div>
<h2>Every method scored. <span>Every null kept. Every blocker named.</span></h2>
</td>
</tr>
<tr>
<td colspan="12" width="100.0%" valign="top">
<td width="66%" valign="top">
<div><b>06.1 · COMPARATIVE PERFORMANCE · BENCHMARK SURFACE</b></div>
<div>
<div>
Expand All @@ -110,92 +131,155 @@ pytest -q
<div>Today's measurement is the rig itself: pytest on a fresh Python 3.11 clone, adapter contract coverage, forbidden-pattern lint, and byte-equal replay across macOS and Linux. Live Indus Phase 4 rerun and Cuneiform adapter are <strong>not</strong> in scope yet.</div>
</td>
</tr>
</table>

<table width="100%">
<tr>
<td colspan="12" width="100.0%" valign="top">
<td width="100%" valign="top">
<div><b>07 · KEY METRICS</b> <span>PUBLIC RECEIPTS</span></div>
</td>
</tr>
</table>

<table width="100%">
<tr>
<td width="20%" valign="top">
<td width="100%" valign="top">
<div><b>07.1 · PYTEST SURFACE</b></div>
<div>37<span>/37</span></div>
<div>Fresh clone &middot; <b>pytest passes end to end</b></div>
</td>
<td width="20%" valign="top">
</tr>
</table>

<table width="100%">
<tr>
<td width="100%" valign="top">
<div><b>07.2 · ADAPTER INTERFACE</b></div>
<div>9<span>/9</span></div>
<div>Adapter v1 &middot; <b>required clauses covered</b></div>
</td>
<td width="20%" valign="top">
</tr>
</table>

<table width="100%">
<tr>
<td width="100%" valign="top">
<div><b>07.3 · FORBIDDEN-PATTERN LINT</b></div>
<div>0<span>/6</span></div>
<div>Repo lint &middot; <b>no forbidden patterns</b></div>
</td>
<td width="20%" valign="top">
</tr>
</table>

<table width="100%">
<tr>
<td width="100%" valign="top">
<div><b>07.4 · PYPI RELEASE</b></div>
<div>0.1.0</div>
<div>Public PyPI &middot; <b>wheel and source ready</b></div>
</td>
<td width="20%" valign="top">
</tr>
</table>

<table width="100%">
<tr>
<td width="100%" valign="top">
<div><b>07.5 · LIVE INDUS RERUN</b></div>
<div>null</div>
<div>Blocked &middot; <b>Phase 3c manifest unblocks it</b></div>
</td>
</tr>
</table>

<table width="100%">
<tr>
<td colspan="4" width="33.33%" valign="top">
<td width="100%" valign="top">
<div><b>08 · FIXTURE REPLAY</b> <span>BYTE-STABLE SCOPE</span></div>
<h2>Replay is byte-stable for fixture records, <span>not rights-limited heavy data.</span></h2>
</td>
<td colspan="5" width="41.67%" valign="top">
</tr>
</table>

<table width="100%">
<tr>
<td width="66%" valign="top">
<div><b>08.1 · WHAT REPLAYS EXACTLY</b> <span>FIXTURE RECORDS ONLY</span></div>
<p>Smoke and replay outputs are <strong>byte-identical across documented macOS and Linux Python 3.11 environments</strong>, verified end-to-end through fresh-clone install. Replay records and SHA-256 reference-freeze helpers ship as part of the public package.</p>
<p>That does not prove deterministic live Indus replay from repo custody, undisclosed heavy data, or cultural-heritage imagery — and it does not prove any text recovery. The unit of byte-exactness is <em>benchmark-fixture replay</em>, not domain-result reproduction.</p>
</td>
<td colspan="3" width="25.0%" valign="top">
<td width="34%" valign="top">
<div><b>08.2 · HONEST BLOCKER</b></div>
<span>Honest Blocker &middot;</span>
<p>Live Indus Phase 4 rerun is blocked on Phase 3c manifest access. Heavy-data and image-bearing release policy is open. Cuneiform adapter is deferred to a separate contract once the first live Indus replay lands. PyPI: <code>gnosis-morph-bench==0.1.0</code>; no <code>0.1.1</code> until a receipt. Non-claims: <em>no text recovery, no source identification, no data permission, no hosted service</em>.</p>
</td>
</tr>
</table>

<table width="100%">
<tr>
<td colspan="6" width="50.0%" valign="top">
<td width="33%" valign="top">
<div><b>09</b> </div>
<h2>ONE BASELINE THE FIELD <span>CAN INHERIT.</span></h2>
</td>
<td colspan="6" width="50.0%" valign="top">
<td width="67%" valign="top">
<div><b>09.1 · THE AMBITION</b></div>
<p>An ancient-script field that loses its scoring rig with every postdoc loses a generation of measurement. <em>Morph-Bench</em> is the rig the next lab installs instead of writing — a public floor for method comparison that outlives the researcher who first ran it.</p>
</td>
</tr>
</table>

<table width="100%">
<tr>
<td colspan="6" width="50.0%" valign="top">
<td width="33%" valign="top">
<div><b>09.2 · WHAT WORKS NOW</b></div>
<h2>Working today: 37 tests pass, adapter contract holds, fixture replay is byte-stable, public PyPI release available.</h2>
</td>
<td colspan="6" width="50.0%" valign="top">
<td width="67%" valign="top">
<div><b>09.3 · WHAT'S STILL OPEN</b></div>
<h2>Still blocked: live Indus Phase 4 rerun, heavy-data release policy, and a future Cuneiform adapter.</h2>
</td>
</tr>
</table>

<table width="100%">
<tr>
<td width="20%" valign="top">
<td width="100%" valign="top">
<div><b>09.4</b> &middot; LABS · NEAR-TERM (12–24 MO)</div>
<div>One installable baseline, not three rebuilds</div><div>A second lab studying Indus morphology can install the same scoring rig their predecessor used, run it on their candidate method, and compare results that have a common floor. The conversation moves from "whose code" to "whose finding".</div>
</td>
<td width="20%" valign="top">
</tr>
</table>

<table width="100%">
<tr>
<td width="100%" valign="top">
<div><b>09.5</b> &middot; REVIEW · NEAR-TERM (12–24 MO)</div>
<div>Reviewers see the null beside the result</div><div>A journal reviewer reading a morphology paper can ask for the permutation null and the stability sweep alongside the headline number, knowing both were produced by a public harness, not the author's private script. Skeptical review gets cheaper.</div>
</td>
<td width="20%" valign="top">
</tr>
</table>

<table width="100%">
<tr>
<td width="100%" valign="top">
<div><b>09.6</b> &middot; MUSEUMS · MID-TERM (24–48 MO)</div>
<div>Heritage archives keep the method, not just the picture</div><div>When a museum funds a script-morphology study, the resulting paper can ship with a replayable run — same scoring rig, same null check, same stability battery — so the archive's investment outlives the postdoc who did the work. The method becomes part of the collection.</div>
</td>
<td width="20%" valign="top">
</tr>
</table>

<table width="100%">
<tr>
<td width="100%" valign="top">
<div><b>09.7</b> &middot; CROSS-SCRIPT · MID-TERM (24–48 MO)</div>
<div>Indus and Cuneiform share a floor</div><div>Once a Cuneiform scorer sits beside the Indus one, two distant script communities can argue about results on the same scoring floor. Disagreement narrows to the data and the model — the scoring rig stays the same, so the argument finally becomes worth having.</div>
</td>
<td width="20%" valign="top">
</tr>
</table>

<table width="100%">
<tr>
<td width="100%" valign="top">
<div><b>09.8</b> &middot; CONVENTION · PARADIGM (48 MO+)</div>
<div>A standard the next generation inherits</div><div>A graduate student starting in ancient-script morphology in 2030 finds the same baseline their advisor used in 2026 — installable, documented, with null results preserved. The field stops losing a generation of measurement every time the tools rot. One kept harness, many researchers.</div>
</td>
Expand Down
Loading