Skip to content

feat(latex): add LaTeX language support#202

Open
NandishNaik01 wants to merge 6 commits into
bearcove:mainfrom
NandishNaik01:feat/add-latex
Open

feat(latex): add LaTeX language support#202
NandishNaik01 wants to merge 6 commits into
bearcove:mainfrom
NandishNaik01:feat/add-latex

Conversation

@NandishNaik01
Copy link
Copy Markdown

@NandishNaik01 NandishNaik01 commented May 15, 2026

Add LaTeX Syntax Highlighting Support to Arborium

Overview

This PR introduces syntax highlighting support for LaTeX (.tex) files in Arborium.

The implementation uses the [latex-lsp/tree-sitter-latex](https://github.com/latex-lsp/tree-sitter-latex) grammar at commit 7e0ecdc02926c7b9b2e0c76003d4fe7b0944f957 (MIT licensed).

image

Classification

  • Group: group-willow

    • Added alongside other markup/document languages such as Markdown, Typst, and AsciiDoc.
  • Tier: 3

    • Classified as a niche document markup language.
  • External Scanner: true

    • The grammar depends on an external scanner.c implementation.

Highlights Query Implementation

The upstream tree-sitter-latex grammar does not ship with a queries/ directory, so highlights.scm was implemented entirely from scratch.

The query covers the following constructs:

Commands

  • command_name@function

Environments

  • \begin{...}@function.builtin
  • \end{...}@function.macro

Math Regions

  • displayed_equation
  • inline_formula
  • math_environment

All mapped to:

  • @markup.raw

Sectioning

Supports:

  • \section
  • \subsection
  • related heading commands

Mapped using:

  • @namespace
  • @markup.heading

Labels & References

  • curly_group_label@label

Citations

  • curly_group_text_list@string

File Inclusion

Covers:

  • \usepackage
  • \input
  • related include/import commands

Mapped to:

  • @keyword.storage.type
  • @keyword.control.import

Text Formatting

  • \textbf@markup.bold
  • \textit / \emph@markup.italic

Additional Coverage

  • Comments → @comment
  • Punctuation
  • Operators

Grammar Compatibility Notes

Field names were verified directly against the grammar’s node-types.json.

Some references from Helix queries required updates due to renamed fields in the upstream grammar:

  • curly_group_textcurly_group_label
  • path:paths: in bibtex_include

Demo & Verification

Tested locally using:

cargo xtask build --dev latex
cargo xtask serve --dev

Highlighting renders correctly across multiple themes, including:

  • Tokyo Night
  • Catppuccin Latte
  • other bundled themes

The following syntax categories were visually verified:

  • commands
  • environments
  • math regions
  • sections/headings
  • comments
  • inline formatting

Tests

running 2 tests
test tests::test_corpus ... ok
test tests::test_grammar ... ok

test result: ok. 2 passed; 0 failed; 0 ignored

Files Added

langs/group-willow/latex/def/
├── arborium.yaml          # language definition
├── grammar/
│   ├── grammar.js         # from tree-sitter-latex
│   └── scanner.c          # from tree-sitter-latex src/scanner.c
├── queries/
│   └── highlights.scm     # custom highlights query
└── samples/
    └── sample.tex         # representative LaTeX sample

demo/samples/latex.tex     # live demo sample

NandishNaik01 and others added 4 commits May 16, 2026 03:14
Adds syntax highlighting for LaTeX (.tex) files using the
latex-lsp/tree-sitter-latex grammar (MIT, commit 7e0ecdc).

highlights.scm is authored from scratch (the grammar ships no queries),
using Helix's LaTeX queries as a reference with field names corrected
to match the current node-types.json (e.g. curly_group_label for labels,
curly_group_path_list for bibtex paths). Both corpus and grammar
validation tests pass.

Placed in group-willow alongside other markup languages (markdown,
typst, asciidoc).

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Replace dead (word)/#eq? & pattern with (delimiter) @punctuation.delimiter
  The word rule excludes & by regex; & is tokenized as the delimiter node
- Remove required value: field from key_value_pair pattern so bare option
  keys (e.g. \documentclass[draft]{article}) receive @variable.parameter
- Add bare-declaration pattern for new_command_definition so \newcommand\foo
  receives @function.macro rather than falling back to generic @function

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Add hyperlink node captures for \url and \href — both parse as
  `hyperlink` not `generic_command`, so the previous generic_command
  #match? patterns were permanently dead code
- Update CHANGELOG.md with unreleased entry for LaTeX support
- Regenerate CI workflow via cargo xtask ci generate — latex was missing
  from the build-plugins-willow job

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@NandishNaik01
Copy link
Copy Markdown
Author

Optional follow-up: indents.scm

LaTeX has a natural indentation boundary with \begin{env} / \end{env} — happy to add an indents.scm in a follow-up PR if that would be useful. No other group-willow language currently ships one, so I left it out of this PR to keep the scope minimal, but can add it if the maintainers want it.

Example of what it would look like:

(begin_command) @indent
(end_command) @outdent

Let me know!

NandishNaik01 and others added 2 commits May 16, 2026 03:33
Adds 5 previously uncovered node types:
- verbatim_environment, listing_environment, minted_environment → @markup.raw
- subscript and superscript (math $x_i^2$) → @markup.raw
- todo (\todo{...}) → command as @comment, message as @string

Updates sample.tex with lstlisting, subscript/superscript math,
and a \todo note to exercise the new patterns.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Adds automatic indentation support: content inside LaTeX environments
indents on \begin{...} and outdents on \end{...}.

Uses @indent.begin / @indent.end capture names per arborium convention.
First group-willow language to ship indents.scm.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@NandishNaik01
Copy link
Copy Markdown
Author

Update: is now included in this PR — indents and outdents using the correct (begin) @indent.begin / (end) @indent.end capture names per arborium convention. First group-willow language to ship one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant