Skip to content

feat: add Typst as a spec format#185

Open
lovesegfault wants to merge 49 commits into
bearcove:mainfrom
lovesegfault:typst-spec
Open

feat: add Typst as a spec format#185
lovesegfault wants to merge 49 commits into
bearcove:mainfrom
lovesegfault:typst-spec

Conversation

@lovesegfault
Copy link
Copy Markdown
Contributor

@lovesegfault lovesegfault commented Apr 8, 2026

Support .typ files as spec documents alongside Markdown. Adds a
SpecFormat enum dispatching to per-format parsers; the Markdown path
wraps marq unchanged, the Typst path parses requirement markers via
arborium-typst (tree-sitter) and renders HTML via the typst compiler
behind a typst-spec feature (default-on, ~340 transitive crates).

Spec authors write #r("rule.id")[body] and import the new
@preview/tracey typst package for standalone compilation and editor
support; tracey strips that import and substitutes its own
sentinel-emitting definitions when rendering for the dashboard.
Relative imports resolve against the spec file's directory; package
imports resolve offline from a config-specified vendored tree or the
system typst cache (tracey never downloads). The requirement-marker
prefix can be any identifier that is not a typst standard-library
global; rule-body version diffs use the same word-level
~~del~~ **ins** rendering as Markdown.

Also extracts the rule-coverage badge HTML into a free function so both
backends share it, and adds heading-slug dedup across mixed-format spec
sets.

@lovesegfault
Copy link
Copy Markdown
Contributor Author

obviously extremely clauded, but i wanted to see if it'd work

@lovesegfault
Copy link
Copy Markdown
Contributor Author

@fasterthanlime if you have a sec, i was curious if you're interested in this at all or if i should drop it.

i gave this a shot b/c i was struggling to write specs for more mathy subjects in markdown that i could also render as papers for people to read.

@fasterthanlime
Copy link
Copy Markdown
Contributor

Hi yes, belated but yes I'm interested in that. Will rebase + merge eventually

Restructure spec.rs into a spec/ module with an enum-dispatched facade over per-format backends. Markdown delegates to marq; Typst is stubbed with explicit errors. Logic for marker-prefix extraction, marker rewriting and weight parsing is copied from crates/tracey/ unchanged so the originals can be routed here in a follow-up without behaviour change.
Replace direct marq calls and `ext == "md"` checks across crates/tracey with
the SpecFormat facade from tracey-core. Zero behaviour change for Markdown
specs; .typ paths now flow through the same dispatch (Typst backend stubbed).

- data.rs: scan/walk filters use is_spec_extension; rule extraction and
  diagnostics use parse_spec/parse_weight; delete duplicated
  extract_marker_prefix_from_content; add devicon for .typ; update help glob
- lib.rs: load_rules_from_glob dispatches on SpecFormat; delete duplicated
  extract_marker_prefix
- bump.rs: parse_spec_rules and marker rewrite take SpecFormat derived from
  staged-file path
- daemon/service.rs: LSP symbols/tokens/codelens/inlay/highlight/find-rule
  dispatch on SpecFormat; version-diff uses diff_inline with raw-text
  fallback; add typst LSP languageId; FORMAT-NOTE search snippet + inline
  code spans
- bridge/lsp.rs: clear-diagnostics filter uses is_spec_extension
- tracey-config: doc-comment mentions .typ

Intentionally deferred (Phase 7):
- data.rs combined-doc render with custom handlers
- service.rs search-snippet render (stored prose)
Wire the typst compiler behind a `typst-spec` feature (default-on in
the `tracey` bin). `render_display` builds a single-file in-memory
World, compiles to HTML via `typst_html`, then post-processes the
output: sentinel `<div data-req-id>` wrappers are replaced with the
caller's badge container, and heading slugs from the tree-sitter parse
are injected as `id=` attributes. `<style>`/`<link>` from the
compiler's `<head>` are lifted into `head_injections`.

`load_spec_content` now routes typst sections through this path, with
a `parse_spec` placeholder fallback when the feature is disabled.
…pecs

- bump: integration test for .typ spec files (#req marker rewrite)
- search: thread SpecFormat through RuleEntry/SearchResult so snippet
  rendering can dispatch by dialect; typst snippets are now html-escaped
  (preserving <mark> tags) instead of being misrendered as markdown
- service: document known limitation in git-history format inference
  for cross-format spec renames
- zed: register Typst language with the LSP
- watcher: rebuild test for .typ spec edits (full reparse path); helper
  refactored to accept arbitrary spec filenames
- lsp: semantic-token test for .typ spec files asserts one token per
  #req marker with the DEFINITION modifier
SpecWorld now carries a base_dir and resolves non-main FileIds against it via VirtualPath::resolve, with per-FileId caching for both source() and file(). Package imports remain rejected (no package manager). render_display gains a base_dir parameter; the data layer derives it from the spec file's parent directory. Adds unit tests for relative-import resolution and package rejection, extends the fixtures-typst integration to import a helper file, and updates the docs limitation note.
The previous `ident.len() <= 5` heuristic false-positived on short
typst stdlib calls like `#image("foo.png")` and `#link("url")[text]`,
poisoning downstream prefix inference (which hard-errors on mixed
prefixes). It also arbitrarily rejected legitimate longer prefixes like
`#requirement(...)`.

Replace it with an explicit denylist of typst standard-library globals,
checked via binary_search. Any non-stdlib ident is now a valid marker
prefix; the existing "multiple requirement marker prefixes" error in
tracey::data remains the safety net for genuine mismatches.
Word-level LCS diff producing ~~removed~~ / **added** markdown markup, matching the output convention of marq::diff_markdown_inline so LSP hovers and CLI "changes from previous version" render identically for both spec formats. Drops the corresponding limitation note from the typst docs.
…reqs

Define `REQ_ANCHOR_PREFIX` / `req_anchor_id` once in `spec` and route all
Rust and dashboard consumers through it so the typst backend (and any
future backend) emits the same `r--{id}` anchors as marq. Teach
`extract_req` and the typst preludes to accept bare `#req("id")` with no
body, fixing the previous early-return that silently dropped such
definitions.
SpecWorld hardcoded the main vpath as `spec.typ`, shadowing real sibling
files of that name, and only probed cache_dir for packages, missing
data_dir where `@local` packages live. Now the main vpath uses the
actual source file name, package resolution probes vendored → data →
cache (matching typst-kit), and the not-found hint is namespace-aware.
…dynamic prelude

The typst pipeline previously ran tree-sitter and `typst::compile`
independently and reconciled them with a static prelude plus a positional
zip of slugs onto sentinel `<hN>`s. Any disagreement (a heading inside a
`#req` body, a `#heading()` call, a custom marker prefix) silently
corrupted the output, and `data.rs` then mutated outline slugs after the
HTML strings were already finalised so anchors and outline could diverge.

This makes the typst output self-describing so post-processing is keyed,
never positional:

- `spec::SlugAllocator` hands out globally-unique heading slugs across a
  multi-file spec.
- `build_prelude()` replaces the static prelude: it aliases every marker
  prefix discovered by the tree-sitter parse, and the `req` body now sets
  a nested `#show heading:` that emits plain `<hN>` so headings inside
  requirement bodies never produce `tracey-h` sentinels.
- `inject_heading_ids` claims each sentinel's slug from the allocator and
  writes it back into the heading struct; surplus sentinels (from
  `#heading()` calls tree-sitter doesn't see) get `section`/`section-2`
  anchors instead of being skipped.
- `load_spec_content` threads a single allocator through every render
  run, re-slugging marq headings in-place and rewriting their `id="..."`
  in the HTML. `dedup_heading_slugs` is gone.
Without the feature the typst section renders as a `<pre>` placeholder
with no heading anchors; the outline-dedup half of the test still holds.
Adding a language previously required synchronized edits at three sites
(SUPPORTED_EXTENSIONS, code_units::extract, extract_refs_with_warnings).
Lean was added at one of three and silently dropped on the floor.

A single define_languages! invocation now derives both the flat extension
list (always compiled) and the tree-sitter Lang table (behind 'reverse')
from one row per language, and the two code_units dispatches collapse to
a for_ext lookup.

Also picks up mts/cts which were dispatched but never walked.
The allocator only tracked input bases, so alloc("intro"),
alloc("intro-2"), alloc("intro") returned a duplicate "intro-2". It
also debug-asserted that inputs never start with "r--", but marq's
hierarchical heading ids join parent and child with "--", so a spec
shaped like `# R` / `## Design` produced "r--design" and panicked.

Now alloc() rewrites only the literal "r--" prefix (`h-{rest}`, or
"section" when empty), then probes a per-base counter against a set of
every emitted slug so suffix candidates can never collide with a value
already handed out. Other hierarchical ids such as "auth--login" pass
through unchanged.

The markdown re-slug loop in data.rs now anchors its replace on the
`<hN id=` tag rather than a bare `id=` so it can never patch a
req-container div that shares the slug.
`rewrite_marker` no longer searches for the first `"` (which broke on
`#req(level: "shall", "a.b")`). It now takes an explicit `id_range`
and splices; the new `id_range_in_marker` dispatch locates the id via
`[`/`]` for markdown and a tree-sitter parse of the marker for typst,
where the positional id is the first direct `string` child of the arg
`group` (named args are nested under `tagged` and skipped).

`splice_req_badges` now keys `by_id` on `RuleId` and parses the
HTML-unescaped `data-req-id` literal before lookup, so `"a.b+1"`
correctly resolves to the version-1 definition instead of missing.
… positional zip

The prelude's heading show rule now emits the flattened heading text in a
data-base-slug attribute (via a recursive _ts content→string helper that
falls back to repr for math). assign_heading_ids reads each sentinel's
slug seed directly, HTML-unescapes it, slugifies, and allocates — no more
zipping compiler sentinels against tree-sitter headings by index.

This fixes slug shifting when #heading(..) calls or #include'd files emit
sentinels tree-sitter cannot see: previously every subsequent markup
heading inherited the wrong anchor.

doc.headings and doc.elements are now rebuilt entirely from sentinel order
in the compiled HTML (interleaving tracey-h and tracey-req positions) so
the outline lists every emitted heading and attributes reqs to the right
section. The lightweight parse() path keeps tree-sitter heading extraction
for line numbers; render() overwrites it with the authoritative list.
SpecWorld now records every non-package file the typst compiler reads
while resolving #import / #include. render_display drains those paths
into a caller-supplied out-param BEFORE checking compiled.output, so a
helper with a syntax error still registers as a dependency.

render_spec_content_for_impl relativizes the deps against project_root
and returns them alongside the rendered spec. The daemon engine stores
them in a RwLock<HashSet> populated on each lazy spec render; the
FilesChanged watcher filter consults that set after gitignore/temp
filtering but before the exclude/include glob checks, so editing a
helper that matches no config glob (or matches an impl exclude) now
triggers a rebuild.
render_spec_content_for_impl previously returned deps inside the Result
tuple, so a failed compile (e.g. syntax error in an #import-ed helper)
dropped them on the floor before the daemon could record them — undoing
the drain-before-? guarantee one frame higher.

Switch to a &mut HashSet out-param (matching load_spec_content and
render_display): relativize into it BEFORE propagating the load error,
and have the daemon service record the set unconditionally before
matching on the result. New integration test deps_reported_when_render_fails
locks this in.
The spec_deps short-circuit lived in a separate .filter() closure that
ran after the gitignore .filter(), so a typst #import helper inside a
gitignored directory was dropped before spec_deps could rescue it.

Collapse both closures into a single accept_changed_path() with a
documented precedence (spec_deps > gitignore > exclude > include),
hoist the is_dir() FS probe to the caller so the predicate is pure,
and add unit tests covering the gitignored-but-depended-on case.
…port strip, sidebar narrow-viewport persistence

- search: replace literal `<mark>` injection with U+E000/E001 PUA
  sentinels in both tantivy and simple-index paths. New `marks_to_html`
  (escape then swap) and `pua_to_mark` (swap only) helpers let the
  service layer escape user content without losing highlight spans, so a
  literal "<mark>" in a rule body no longer renders as a highlight.
- typst-package: fix repo URL (tracey-rs → bearcove) and add
  `repository` to typst.toml.
- tracey-core: strip_tracey_imports now triggers on any unbalanced `(`
  on the import line, not just a trailing one — `#import "...": (r,`
  with items on the first line is now blanked correctly.
- dashboard: sidebar auto-collapse on narrow viewports no longer
  overwrites the persisted wide-viewport preference. Uses a reactive
  matchMedia listener; localStorage writes are gated to wide mode and
  the saved choice is restored on widening.
…lidate on config error

The any-ident extractor picked up third-party calls like unify's #qty("5","s")
as phantom markers, causing duplicate-ID config-load failures. And those
config failures exited 0 from `tracey query validate`, masking the break.

Replace the TYPST_BUILTINS denylist with an explicit r|req allowlist, and
make validate() return the config-error banner as has_errors when no
spec/impl combinations load.
- dedupe is_spec_extension: drop sources:: copy, keep spec:: (SpecFormat-based)
  with .sdoc bridged until a SpecFormat::Sdoc variant lands
- extract_sdoc_rules_cached: CachedMarkdownFile/markdown_files were renamed to
  CachedSpecFile/spec_files on this branch
- sdoc.rs: ExtractedRule gained a `format` field; report Markdown for sdoc
  rules pending SpecFormat::Sdoc
- BadgeFn type alias (Arc<dyn Fn -> (String,String)>) replaces the borrowed
  &dyn Fn on RenderInput.badge_for: marq::with_req_handler requires a static
  handler, and the existing typst::RenderCtx already uses the (open,close)
  tuple shape.
- BadgeReqHandler adapts BadgeFn to marq::ReqHandler.
- reslug_marq_html ports the forward-cursor heading-slug rewrite from
  data.rs:3104-3128 verbatim.
- render_html does NOT yet configure diagram/inline-code handlers or
  source_path — those live in crates/tracey/ and need Task 5 to either move
  them or pass a pre-built RenderOptions through RenderInput.
…istry

- SpecFormat::backend() looks up the DynBackend; from_ext/name/from_name
  query BACKENDS instead of matching the enum
- parse_spec/diff_inline/parse_weight/extract_marker_prefix/id_range_in_marker
  bodies become fmt.backend().<op>(); signatures unchanged
- drop dead_code allow on BACKENDS/DynBackend; keep targeted allows on
  render_html/render_inline (wired in tasks 5/6)
- typst.rs carry-over: drop redundant move on badge adapter; document
  why RenderInput.root is unused
- RenderInput gains marq_opts (Option<&mut RenderOptions>) so the markdown
  backend reuses the caller-built diagram/inline-code handlers; backend
  overwrites source_path + req_handler per render
- BadgeFn now takes (&ReqDefinition, &str source_path) — backends supply
  the path, eliminating the TraceyRuleHandler / current_source_file mutex
  side-channel
- RenderOutput -> { sections: Vec<RenderedSection>, deps } so each backend
  controls its own section granularity (md: 1/run, typst: 1/file)
- new public facades render_spec_html / render_spec_inline hide DynBackend
- SpecConfigs::insert + ErasedConfig::new for caller-supplied config
  overrides until styx-subtree deserialization lands
- data.rs::load_spec_content: 113-line match -> 35-line format-agnostic loop
- delete TraceyRuleHandler (subsumed by BadgeFn + BadgeReqHandler)
facet-styx is string->struct only (no raw subtree value type), so the
generic `deserialize_config(raw)` design is unimplementable. Replace with
a fully-typed flow:

- tracey-config: SpecConfig.typst_package_path -> SpecConfig.format:
  FormatConfig { typst: TypstFormatConfig { package_path } }. One field
  per backend that needs config.
- tracey-core: deserialize_config -> default_config (it never deserializes);
  SpecConfigs::load() -> impl Default (infallible).
- tracey: new data::build_spec_configs(format, root) does the one
  irreducible per-format conversion (string path -> resolved PathBuf).
  load_spec_content / render_spec_content_for_impl now take &FormatConfig
  instead of Option<&Path>; data struct stores format_config_by_spec.
- typst error hints + docs updated to the new config key.
The test asserted custom prefixes like #spec(...) compile, but that
behaviour was deliberately removed when the TYPST_BUILTINS denylist was
replaced with an explicit r|req allowlist (third-party package calls were
being picked up as phantom markers). Invert the test to lock in the
allowlist decision and update writing-specs.md to match.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants