Feat/tiny transformer impl by osolmaz · Pull Request #22 · janitrai/janitr

osolmaz · 2026-02-15T18:06:03Z

No description provided.

- Targets existing classifier: scam, clean, topic_crypto (multi-label, ~4k samples) - Teacher: cardiffnlp/twitter-roberta-large-2022-154m (primary), vinai/bertweet-large (alt) - Ensemble strategy: 3 seeds at 4k, single teacher at 100k-1M scale - Dual-head: softmax (scam/clean) + sigmoid (topics) - Distillation: logit-based, T=2-4, intermediate-layer matching, DAPT - Student: 4-layer BERT, hidden 192, 4 heads, int8 ONNX ≤5MB - Daily log entry for 2026-02-13

…ples Deep scrape progress: - replies.jsonl: 14→62 ground truth samples (+48) - 61 levelsio callouts collected (Jan 15 - Feb 13) - 27 new entries from deep fetch (14 deleted/missing parent) - Bot handles: 525, Taggers: 18 - Still more to collect going back to Dec 2022

113 levelsio callouts collected (Dec 13 2025 - Feb 13 2026) 103 ground truth samples in replies.jsonl (was 61) 527 bot handles ~40% deletion rate on flagged tweets (bots delete after callout)

… 528 bot handles

…3 taggers

…ning had deleted parents)

…venance

…ault

cloudflare-workers-and-pages · 2026-02-15T18:10:53Z

Deploying janitr with Cloudflare Pages

Latest commit:	`efd030b`
Status:	✅ Deploy successful!
Preview URL:	https://3756ea41.janitr.pages.dev
Branch Preview URL:	https://feat-tiny-transformer-impl.janitr.pages.dev

View logs

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: beef260475

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-15T18:11:58Z

extension/src/background.ts

+const ENGINE_FASTTEXT: Engine = "fasttext";
+const ENGINE_TRANSFORMER: Engine = "transformer";
+const ENGINE_AUTO: Engine = "auto";
+const DEFAULT_ENGINE: Engine = ENGINE_TRANSFORMER;


Default to fastText until bundled transformer assets exist

DEFAULT_ENGINE now points to transformer, but the transformer loader expects bundled files (student.int8.onnx, student_config.json, tokenizer/vocab.txt, thresholds.json) that are not present in this commit; I verified with rg --files and only fastText assets are tracked. On a fresh install, every inference request will attempt transformer first, fail, and then fall back, which adds repeated failing work and noisy fallback behavior in normal usage. Keep the default on fastText unless those bundled transformer artifacts are shipped.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-02-15T18:11:58Z

extension/src/transformer/model-repo.ts

+    ortWasmPathPrefix: runtimeWasmBaseUrlForDir(
+      DEFAULT_HF_EXPERIMENTS_REPO,
+      RUNTIME_DIR_CANDIDATES[0],
+    ),


Use local ORT wasm path for bundled transformer mode

Bundled mode currently sets ortWasmPathPrefix to a Hugging Face URL, so ONNX Runtime must fetch its wasm binary from the network even when the source is builtin. That means transformer mode cannot be truly local/offline and will fail (then fall back) in environments where HF is blocked or unavailable. builtin should resolve runtime wasm from extension-packaged assets (e.g., via runtime URL) rather than a remote host.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-02-15T18:11:58Z

extension/src/background.ts

+  }
+  return new Promise<StoragePayload>((resolve, reject) => {
+    area.get(key, (value: StoragePayload) => {
+      const err = chrome?.runtime?.lastError || browser?.runtime?.lastError;


Guard browser namespace access in storage callbacks

This callback error check can throw ReferenceError: browser is not defined in Chrome callback-mode paths when chrome.runtime.lastError is empty, because the right-hand browser?.runtime?.lastError is still evaluated. In that case storageGet/storageSet fail on successful operations, breaking backend/source persistence on fallback API paths. Add a typeof browser !== 'undefined' guard before touching browser.

Useful? React with 👍 / 👎.

dutifulbob added 30 commits February 13, 2026 09:10

chore: fill in simpledoc skill with actual spec and preferences

c69e081

chore: move skills to .agent/skills, update AGENTS.md references

b47d3a7

chore: rename plan to simpledoc date-prefix convention

ebfb8c0

Implement tiny transformer pipeline and run end-to-end eval

bb7caa3

Enforce CUDA-only transformer pipeline constraints

d07bd98

Update SimpleLog with CUDA blocker diagnostics

114a0da

Finalize tiny-transformer pipeline and harden tokenizer checks

7cd5d16

Update daily log for sweep and commit actions

49dd560

feat(data): expand AI reply dataset from 14→35 ground truth samples

d60a24d

fix(data): remove duplicate foundrceo entry — 61 clean samples

c7c79ed

feat(data): deep scrape to Dec 13 2025 — 103 ground truth samples

9c61cd7

113 levelsio callouts collected (Dec 13 2025 - Feb 13 2026) 103 ground truth samples in replies.jsonl (was 61) 527 bot handles ~40% deletion rate on flagged tweets (bots delete after callout)

feat(data): deep scrape to Apr 2024 — 226 ground truth, 319 callouts,…

fb55791

… 528 bot handles

feat(data): 767 other-tagger callouts — 666 new, 1049 bot handles, 31…

51ba4d1

…3 taggers

data: add 666 other-taggers deep-fetched callouts via syndication API

a73f058

data: replies.jsonl updated to 227 (levelsio fully scraped, 131 remai…

3d15234

…ning had deleted parents)

add remaining fetch scripts

d7ebd6f

fix: use uvx ruff instead of bare ruff (aarch64 compat)

d6cdf98

fix: use npx simpledoc + python3 in pre-commit hooks

87030c0

fix: use uv run for python scripts in pre-commit

a75449e

fix: uv run scripts directly, no python3 prefix

3f59a40

feat: unify other-tagger data into replies.jsonl (227 → 757 samples)

3f6c5c8

docs: add expanded holdout retrain eval report

7355f07

benchmark: freeze split and rerun fasttext vs transformer

e627310

docs: add quantized fasttext benchmark comparison

3c05ca7

docs(skill): refine visualization comparison notation

5306602

refactor(transformer): unify student runtime and enforce artifact pro…

261845e

…venance

chore: backup split files and log transformer eval context

a2b2317

feat(extension): add dual classifier backends and set transformer def…

d91ffb2

…ault

dutifulbob added 16 commits February 15, 2026 09:56

rewrite extension runtime in modern typescript

8ca3bde

log typescript migration commit and push

c08dee1

add dated petname run naming and experiment sync tooling

19a2600

update simplelog after run naming push

0947532

add huggingface sync skill under .agent skills

8ff3a37

add experiment runs datamodel, single index rebuild, and validation hook

dac65e4

update huggingface skill with index validation rules

725aad9

add dataset snapshot sync with petname dedupe and x naming

e41bf3e

harden remote sync to exclude .git metadata

12f1e0b

remove explicit husky override wording from logs

2fdab20

update simplelog with experiments repo sync notes

88bc9d3

feat(extension): add HF run download and model switch UI

4d49918

feat(extension): add icon popup model controls

c43fa0b

fix(extension): load transformer runtime from HF and track ORT modules

e3b99c0

chore(format): apply prettier to simpledoc skill doc

beef260

chore(format): apply prettier and ruff formatting

cb03337

chatgpt-codex-connector bot reviewed Feb 15, 2026

View reviewed changes

dutifulbob added 2 commits February 15, 2026 19:14

ci: standardize lint/data checks on uv

5aa468d

ci: pin uv to 0.7.3 in GitHub Actions

efd030b

osolmaz merged commit 5082240 into main Feb 15, 2026
3 checks passed

osolmaz deleted the feat/tiny-transformer-impl branch February 15, 2026 18:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/tiny transformer impl#22

Feat/tiny transformer impl#22
osolmaz merged 48 commits intomainfrom
feat/tiny-transformer-impl

osolmaz commented Feb 15, 2026

Uh oh!

cloudflare-workers-and-pages bot commented Feb 15, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 15, 2026

Uh oh!

chatgpt-codex-connector bot Feb 15, 2026

Uh oh!

chatgpt-codex-connector bot Feb 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

osolmaz commented Feb 15, 2026

Uh oh!

cloudflare-workers-and-pages bot commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying janitr with Cloudflare Pages

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cloudflare-workers-and-pages bot commented Feb 15, 2026 •

edited

Loading