Skip to content

fix(desktop): repair inline math rendering for LLM output#3666

Open
lightfront wants to merge 6 commits into
esengine:main-v2from
lightfront:fix/inline-math-rendering
Open

fix(desktop): repair inline math rendering for LLM output#3666
lightfront wants to merge 6 commits into
esengine:main-v2from
lightfront:fix/inline-math-rendering

Conversation

@lightfront

@lightfront lightfront commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Fix: Inline math rendering for LLM output

Problem

LaTeX source was being rendered as raw text (with red KaTeX errors) in several common scenarios:

  1. Block math $$ glued to prose without blank lines
  2. Single digits ($1$, $2$) and multi-digit numbers rejected as currency
  3. Single uppercase letters ($S$, $A$) rejected as English words
  4. One-sided comparisons (< B, A <) not recognized as math
  5. $ symbols inside code blocks causing KaTeX parse errors
  6. Malformed single-line code blocks not protected
  7. Prose currency like "cost $5 and $6" had dollar signs converted to HTML entities

Solution

1. Block math blank-line repair (mathNormalize.ts Step 2.5)

When $$ appears after prose (letter or punctuation) without a blank line, insert \n\n before it. This satisfies CommonMark's requirement that block math be separated from surrounding content.

Example:

# Before (broken)
decomposes as$$
\mathbf{6}\otimes...

# After (fixed)
decomposes as

$$
\mathbf{6}\otimes...

2. Improved math classifier (mathClassify.ts)

Added recognition for common minimal-LaTeX patterns:

Pure numbers and letters:

  • $1$, $2$, $42$, $2.5$ → math (counts, indices, values)
  • $S$, $A$, $I$ → math (set/algebra/group names)

Number combinations:

  • Comma-separated: A, B, 1, 2, 3, (A, B) → math (tuples, pairs)
  • With variables: $2.5x$, $3y^2$ → math (implicit multiplication)
  • With LaTeX: $10\%$, $5\cdot3$ → math (math operators)

One-sided comparisons:

  • < B, <= 0, A <, B <= → math (implicit operand)

Example:

"$1$ 和 $2$ 之间有无穷多有理数" → math (1 and 2)
"$S$ 非空" → math (set S)
"把 $(A, B)$ 整体" → math (ordered pair)
"A 的每个元素 $< B$" → math (comparison)
"$2.5x + 3$" → math (number with variable)

3. Code block $ protection (mathNormalize.ts)

Escape $ to &#36; inside code blocks and inline code to prevent KaTeX from attempting to parse them as math. The restoration step does NOT unescape &#36; back to $, keeping the HTML entities in the final output.

Example:

// Input
`r = r.replace(/\$\$/, ...)`

// After normalization
`r = r.replace(/&#36;&#36;/, ...)`

// Final HTML renders as
r = r.replace(/\$\$/, ...)

4. Lenient fenced code detection (mathNormalize.ts)

Removed the requirement that ``` must appear after a newline. Now accepts ``` anywhere in the text, handling malformed code blocks that are all on one line.

5. Single-line code block fix (mathNormalize.ts)

When the entire document is on one line (no newlines), treat ``` as a simple toggle: first occurrence is opening, second is closing. This handles pasted documentation where code blocks are inline.

6. Preserve prose currency (mathNormalize.ts Step 5)

Changed the regex to non-greedy matching and removed HTML entity conversion for non-math pairs. Now "These two apples cost $5 and $6" preserves its dollar signs unchanged instead of converting them to &#36;5 and &#36;6.

Before:

Input:  "These two apples cost $5 and $6"
Output: "These two apples cost &#36;5 and &#36;6"

After:

Input:  "These two apples cost $5 and $6"
Output: "These two apples cost $5 and $6" (unchanged)

Changes

Files modified

  1. desktop/frontend/src/components/mathNormalize.ts

    • Added Step 2.5: blank-line insertion before glued $$
    • Modified protectMarkdownCode: escape $ to &#36; in code segments
    • Modified restoration: do NOT unescape &#36; back to $
    • Modified fencedCodeEnd: removed newline requirement, added single-line toggle logic
    • Modified Step 5: non-greedy regex, preserve original text for non-math pairs
  2. desktop/frontend/src/components/mathClassify.ts

    • Accept all pure numbers (single, multi-digit, decimals, percentages) as math
    • Accept numbers with variables ($2.5x$) or LaTeX ($10\%$) as math
    • Accept comma-separated lists as math
    • Accept one-sided comparisons as math
    • Accept single uppercase letters as math
  3. desktop/frontend/src/__tests__/math-golden.test.ts

    • Added "minimal LaTeX patterns (regression)" test section
    • Added tests for single digits, uppercase letters, comma-separated lists, one-sided comparisons
    • Added tests for single-line code blocks with $ symbols
    • Updated currency tests to reflect new behavior (dollars preserved)

Test coverage

  • All 122 tests passing (108 + 8 + 6)
  • Coverage includes real-world examples from user-reported issues
  • Currency handling now preserves dollar signs in prose

Trade-offs and limitations

Orphan $$ (model-side issue)

Problem: Model outputs $$... without closing $$, causing the parser to swallow everything until the next $$.

Why not fixed: Every attempt to rescue orphan $$ from the renderer side made the output worse (whole prose paragraphs wrapped in &#36;…&#36;).

Right fix: Upstream—better LLM prompting or post-generation lint.

Testing

All existing tests pass, plus new regression tests covering:

  • Pure numbers: $1$, $42$, $2.5$ → math
  • Numbers with variables: $2.5x$ → math
  • Numbers with LaTeX: $10\%$ → math
  • Comma-separated lists: A, B, (A, B) → math
  • One-sided comparisons: < B, A < → math
  • Single uppercase letters: $S$, $A$ → math
  • Single-line code blocks with $ → protected
  • Prose currency: "cost $5 and $6" → dollars preserved

Run tests:

cd desktop/frontend
npm test

Real-world examples

These examples from user reports now render correctly:

### Chinese math text
$1$ 和 $2$ 之间有无穷多有理数
→ Both 1 and 2 render as math

$S$ 非空
→ S renders as math (set name)

把 $(A, B)$ 整体当作一个新对象
→ (A, B) renders as math (ordered pair)

A 的每个元素 $< B$ 的每个元素
→ < B renders as math (comparison)

### Math with numbers
$2.5x + 3$
→ Entire expression renders as math

$10\%$ increase
→ 10\% renders as math

$42$ elements
→ 42 renders as math

### Currency in prose
costs $5 and $6
→ Dollar signs preserved unchanged

costs $5$ today
→ $5$ renders as math (pure number)

price is $10.50$
→ $10.50$ renders as math (decimal)

### Code blocks
```javascript
r = r.replace(/\$\$/, ...);

→ No KaTeX errors, $ symbols protected

@github-actions github-actions Bot added desktop Wails desktop app (desktop/**) v2 Go rewrite (1.x) — main-v2 branch, active development labels Jun 9, 2026
@lightfront lightfront force-pushed the fix/inline-math-rendering branch from 5c0c8c4 to a8c5f5a Compare June 9, 2026 11:04
Three targeted fixes to the math-pipeline pre-pass that resolve cases
where the rendered chat output showed LaTeX source as raw text:

1. mathNormalize.ts (Step 2.5): when the model writes block math with
   the opening $$ glued to prose on the same line ('…decomposes
   as$$\n\mathbf{6}…'), CommonMark requires a blank line before
   the $$. remark-math otherwise creates an empty math node and the
   formula leaks out as literal text. Insert \n\n before any $$
   preceded by a letter or end-of-sentence punctuation. The
   freshly-rewritten \] → $$ from step 2 is not affected.

2. mathClassify.ts: classify single digits ($1$, $2$) as math —
   commonly used as set / sequence indices. Multi-digit numbers,
   decimals, and percentages stay literal (still currency / percentage).
   This is a deliberate behavior change documented in the comment.

3. mathClassify.ts: allow comma-separated tokens ('A, B', '1, 2, 3',
   '\\alpha, \\beta', '(A, B)') as math. These are typical of
   ordered-pair / tuple / enumeration notation. Currency and env-var
   usage never looks like this.

4. mathClassify.ts: allow single uppercase letters as math. In
   non-English math prose (Chinese / Japanese / Korean textbooks)
   single capital letters are extremely common as set / algebra /
   group / vector-space names, and the closing-dollar form $X$ is
   essentially never written for English words like I/A/V by hand.

Test changes: 4 existing currency/acronym assertions updated to
reflect the new behavior, 13 new regression tests covering all four
fixes including the user's specific cases ('$1$ 和 $2$' and
'$S$ 非空 / $S$ 有上界'). 98 math-golden tests pass, 112/112 across
all suites, typecheck clean.

Orphan $$ (model wrote display math but forgot the closing $$) is
documented as not-fixed-from-the-renderer: every attempt to rescue
the orphan from the renderer side made the output worse, so the fix
for that case is on the LLM side (post-generation lint or stricter
system prompt).
@lightfront lightfront force-pushed the fix/inline-math-rendering branch from a8c5f5a to 5532392 Compare June 9, 2026 11:07
The classifier rules are language-agnostic, not specific to CJK text.
Updated test section name and descriptions to reflect that patterns
like single digits, comma-separated tokens, and one-sided operators
apply universally across languages. Chinese text in test cases remains
as real user examples, but the rules themselves are not CJK-specific.
Add defensive escaping for code blocks containing $ characters.
When protecting code (inline `...` or fenced ```...```), replace
$ with &esengine#36; (HTML entity). On restoration, unescape back to $.

This prevents KaTeX from attempting to parse math delimiters that
appear in code examples, regex patterns, or template literals.

Fixes: Pasted documentation about the math pipeline itself no longer
shows red KaTeX error text.

Tests: 3 new cases added, 106/106 passing
Remove the requirement that ``` must appear after a newline. This
handles cases where documentation is pasted on a single line with
embedded code blocks containing $ symbols.

Previously: ``` markers were only recognized after \n
Now: ``` markers are recognized anywhere

This prevents KaTeX errors (red text) when processing malformed code
blocks that contain $ in regex patterns, template literals, or other
code examples.

All 120 tests pass.
Enhancements to inline math detection:
- Reject pure numbers (1, 2.5, 10) as currency/percentages
- Accept numbers with variables (2.5x, 3y^2) as math
- Accept numbers with LaTeX escapes (10\%) as math
- Fix single-line code block detection to protect $ in malformed markdown

This better matches real-world usage where 'costs $5' is currency
but '$2.5x + 3$' is clearly a mathematical expression.

All 122 tests pass (108 math-golden + 8 text-size + 6 provider-model-refresh).
Previously, the Step 5 regex would greedily match '$5 and $' as a single
math expression with content '5 and ', then convert it to '&esengine#36;5 and &esengine#36;'
because the classifier correctly identified it as non-math. This was visually
correct but had two problems:

1. The greedy match would consume the closing dollar that belonged to the
   next currency token, causing cascade replacements.
2. Prose currency like 'These two apples cost $5 and $6' would have its
   dollar signs converted to HTML entities, which works but is unnecessary
   noise in the rendered output.

Changes:
- Step 5 regex now uses non-greedy matching (+\?) so '$5 and $' doesn't
  match '$5 and $' as a single pair
- When the classifier rejects a match, the original text is preserved
  unchanged (return _m) instead of being wrapped in HTML entities
- This keeps dollar signs visible in prose while still preventing them from
  being parsed as math

All 122 tests pass.
@github-actions github-actions Bot added agent Core agent loop (internal/agent, internal/control) provider Model providers & selection (internal/provider) labels Jun 9, 2026
@lightfront lightfront force-pushed the fix/inline-math-rendering branch from 7843e3b to 83dda9b Compare June 9, 2026 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Core agent loop (internal/agent, internal/control) desktop Wails desktop app (desktop/**) provider Model providers & selection (internal/provider) v2 Go rewrite (1.x) — main-v2 branch, active development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants