Skip to content

CLI: surface result.debug via --json field or stderr log#331

Open
kaznak wants to merge 2 commits into
kepano:mainfrom
kaznak:feat/cli-debug-output
Open

CLI: surface result.debug via --json field or stderr log#331
kaznak wants to merge 2 commits into
kepano:mainfrom
kaznak:feat/cli-debug-output

Conversation

@kaznak

@kaznak kaznak commented Jun 26, 2026

Copy link
Copy Markdown

Problem

Defuddle({ debug: true }) populates result.debug = { contentSelector, removals: DebugRemoval[] } so callers can see which step dropped what. The CLI accepts a --debug flag and passes debug: true through, but nothing in the CLI ever surfaces the resulting result.debug:

  • the --json payload's field list in src/cli.ts does not include debug, so it is dropped silently.
  • nothing is written to stderr either.

The practical effect is that --debug is a no-op for CLI users. The most common reason to enable it on the command line — figuring out which removal pass stripped an element while developing a site-specific extractor — therefore can't be served by the CLI today.

Proposal

Route result.debug through the CLI:

1. With --json --debug — include the existing DebugInfo payload in the JSON object, alongside the other fields:

{
  "content": "...",
  "title": "...",
  ...,
  "debug": {
    "contentSelector": "html > body > main > article",
    "removals": [
      { "step": "removeByContentPattern", "reason": "blog metadata list", "text": "Hugging Face ModelScope" },
      ...
    ]
  }
}

Field shape is unchanged — it is DebugInfo from src/types.ts. Machine consumers (jq pipelines, extractor-development scripts) can read it directly.

2. With --debug and no --json — format the same payload as a one-line-per-removal human-readable log and write it to stderr. Stdout still gets the article content unchanged, so existing pipelines keep working. Example stderr output:

# defuddle --debug
contentSelector: html > body > main > article
removals: 12
[removeBySelector] (.ad) — Advertisement
[removeByContentPattern] blog metadata list — Hugging Face ModelScope
[removeByContentPattern] breadcrumb navigation list — Home > AI > World Models
...

3. Without --debug — no behavioral change. JSON output omits the debug field, plain output is identical to before.

Implementation lives entirely in src/cli.ts: a new formatDebugLog() helper, a debugLog?: string field on the parseSource return value, and the action wrapper writes it to process.stderr.

Tests

tests/cli.test.ts adds:

  • --debug --json embeds debug field with contentSelector (string) and removals (array). debugLog is undefined (the JSON path owns the output).
  • --debug without --json returns a debugLog string starting with # defuddle --debug and containing contentSelector: and removals:. The primary output field stays content-only.
  • Without --debug: JSON has no debug field, plain mode has no debugLog.

All 371 existing tests continue to pass.

Notes

kaznak added 2 commits June 26, 2026 13:47
The Defuddle library populates `result.debug = { contentSelector,
removals: DebugRemoval[] }` when `debug: true`, but the CLI never
exposed it: `result.debug` was omitted from the `--json` payload's
field list, and nothing was written to stderr either, so `--debug`
was effectively a no-op for CLI users.

This change routes the debug payload through the CLI:

- With `--json --debug`: the JSON object gains a `debug` field
  alongside the existing fields. The shape matches `DebugInfo` from
  `src/types.ts` so machine consumers can parse it directly.

- With `--debug` and no `--json`: `parseSource` returns a
  `debugLog` string formatted one removal per line (step, reason,
  selector, text preview). The action wrapper writes it to stderr
  so it does not interleave with the content body on stdout.

- Without `--debug`: no behavioral change. The JSON output omits
  `debug` and `debugLog` is undefined.

Three tests cover the three modes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant