Skip to content

Conversation

@maximilianfalco
Copy link
Contributor

@maximilianfalco maximilianfalco commented Nov 26, 2025

PR App Fix RM-XYZ

🧰 Changes

Note

A few notes to point before everything else. This PR adds logic for a new engine and only does that (rendering stuff). Attaching the new engine for validation and editing markdown will be in separate PRs

Context

This PR exports 2 new libraries which provides a new way to render mixed Markdown + MDX content in our application.
This allows customers to flexibly embed MDX inside Markdown without relying on the strict MDX renderer or needing to migrate everything to MDX (which currently causes many errors and requires hours of cleanup)

Important

With the addition of the new libraries, we unfortunately have exceeded the maximum bundle size allowed. Specifically the current bundle size is 762KB and the limit was 750KB, this has been increased to 775KB

Changes

  1. mdxish.ts
  • Engine to convert a Markdown + MDX string into an HTML AST.
  • Based on Greg’s prototype and uses Unified plugins.
  • Handles MDX by preprocessing its syntax.
  • Additional logic:
    • Reuses existing transformers (e.g., callouts).
    • Adjusts MDX nodes when spacing breaks the AST.
  • Custom component handling:
    • Recursively parses inner content.
    • Renames nodes to PascalCase.
    • Includes heuristics to determine whether a tag is a real component vs. an HTML tag.
  1. renderMdxish.tsx
  • Converts the HTML AST into React JSX components
  • Mimics the existing run.tsx behaviour used in production, returns an RMDXModule which contains the content react component, and the table of contents

We also expose another library called mix but this isnt actually used for rendering. This is simply a wrapper around mdxish that returns stringified HTML instead of HAST. This can be useful for testing/development or when we need a stringified version of the HAST.

🧬 QA & Testing

How to Test

To test this new rendering engine directly in the ReadMe app:

  • Open two terminals:
    • ReadMe (branch: mdxish-demo)
    • Markdown (this branch)
  • Link Markdown to ReadMe:
    • In markdown: npm ci && npm run build
    • In readme: make link-markdown
  • Before starting ReadMe, in the config/development.js , set the mdx.server.enabled variable to false to disable MDX validation and allow content to render in view mode
  • Run the ReadMe repo and open any project — it should not crash
  • Create docs using mixed Markdown/MDX (via Raw mode) and verify they render correctly
    • You can use the files in tests/lib/mdxish/demo-docs as examples in your editor

Things to Test in Docs

  • Built-in ReadMe components
  • User-defined components
  • Table of contents
  • Unclosed tags (e.g., <br>)
  • Links, headings, formatting, etc.

📸 Some Screenshots

These screenshots are sample MD/MDX pages that is rendered using the new libraries. All screenshots here and all demo does not have correct validation yet. We purposefuly disabled validation to demo this new engine/library.

Screenshot 2025-11-27 at 01 05 44 Screenshot 2025-11-27 at 01 06 10 Screenshot 2025-11-27 at 01 06 27

Missing components do not immediately error the entire page

Screenshot 2025-11-27 at 01 07 05

maximilianfalco and others added 30 commits November 19, 2025 15:44
- created more tests
feat: add tests, stubs and exports
feat: first pass at migrating over mdxish code
@maximilianfalco maximilianfalco changed the title falco-dimas/mdxish feat(mdxish): add new MDXish engine Nov 26, 2025
@maximilianfalco
Copy link
Contributor Author

We have created a documentation file that explains the flow of the new engine and all its moving parts (mostly). See it here directly -> docs/mdxish-flow.md

export { default as mdastV6 } from './mdastV6';
export { default as mdx } from './mdx';
export { default as mix } from './mix';
export { default as mdxish } from './mdxish';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the export on L15 is the main entry point for the new engine. As mentioned above these mimic the existing run and compile for the existing mdx function.

  • mix returns stringified HTML
  • mdxish returns HAST
  • renderMdxish takes in HAST and spits out React components


const processedContent = preprocessJSXExpressions(mdContent, jsxContext);

const processor = unified()
Copy link
Contributor Author

@maximilianfalco maximilianfalco Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main processor for mdxish. Its a recursive process to handle processing children nodes. For more details on the plugins:

Phase Plugin Purpose
Pre-process preprocessJSXExpressions Evaluate {expressions} before parsing
MDAST remarkParse Markdown → AST
MDAST remarkFrontmatter Parse YAML frontmatter (metadata)
MDAST defaultTransformers Transform callouts, code tabs, images, gemojis
MDAST mdxishComponentBlocks PascalCase HTML → mdxJsxFlowElement
MDAST embedTransformer [label](url "@embed")embedBlock nodes
MDAST variablesTextTransformer {user.*}<Variable> nodes (regex-based)
MDAST tailwindTransformer Process Tailwind classes (conditional, if useTailwind)
MDAST remarkGfm GitHub Flavored Markdown: tables, strikethrough, task lists, autolinks, footnotes
Convert remarkRehype + handlers MDAST → HAST
HAST rehypeRaw Raw HTML strings → HAST elements
HAST rehypeSlug Add IDs to headings
HAST rehypeMdxishComponents Match & transform custom components

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small note on the remarkRehype plugin. Since we are not using remarkMdx, remarkRehype doesn't know how to handle MDX nodes by default. We pass in our custom handler (mdxComponentHandlers) to convert the mdxJsxFlowElement nodes to HAST elements.

.use(calloutTransformer)
.use(mdxishComponentBlocks)
.use(embedTransformer)
.use(variablesTextTransformer) // we cant rely in remarkMdx to parse the variable, so we have to parse it manually
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small note on variablesTextTransformer, we cant use the existing variables transformer because this expects MDX as the input and requires the plugin remarkMdx. Since we cannot use MDX, we have to create a new transformer that is text-based instead of MDX-based.

variables.ts variables-text.ts
Parser Relies on remarkMdx Uses regex
Input nodes mdxFlowExpression, mdxTextExpression text
Pipeline Full MDX (run.tsx) mdxish (lightweight)
Dependency Needs remarkMdx + ESTree parsing No MDX dependency

Both produce the same output: Variable nodes with hName: 'Variable' and hProperties: { name: fieldName }.

.use(remarkRehype, { allowDangerousHtml: true, handlers: mdxComponentHandlers })
.use(rehypeRaw)
.use(rehypeSlug)
.use(rehypeMdxishComponents, {
Copy link
Contributor Author

@maximilianfalco maximilianfalco Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rehypeMdxishComponents is also a custom plugin we created to handle the final part of the pipeline. Tldr, it mimics what MDX does overall and does the following:

  1. Component matching
  2. Prop normalization (e.g., class to className)
  3. Process children node


const tocHast = headings.length > 0 ? tocToHast(headings, MAX_DEPTH) : null;

return buildRMDXModule(content, headings, tocHast, contextOpts);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renderMdxish converts a HAST tree (from mdxish) into a React component module. The pipeline:

  1. loadComponents() + merge user components → merged component map
  2. extractToc(tree, components) → headings array
  3. exportComponentsForRehype(components) → flattened component map for rehype-react
  4. createRehypeReactProcessor(componentsForRehype) → unified processor
  5. processor.stringify(tree) → React.ReactNode content
  6. tocToHast(headings, MAX_DEPTH) → TOC HAST structure
  7. buildRMDXModule(content, headings, tocHast, contextOpts) → final RMDXModule

Output: RMDXModule with default (main component), toc (heading data), and Toc (TOC component).

For the complete call tree, refer to this graph

HAST Tree (input)
     │
     ├─→ extractToc() → headings[]
     │                       │
     │                       └─→ tocToHast() → tocHast
     │
     └─→ exportComponentsForRehype()
             │
             └─→ createRehypeReactProcessor()
                     │
                     └─→ processor.stringify() → React.ReactNode (content)
                                                          │
                                                          │
buildRMDXModule(content, headings, tocHast, opts) ←──────┘
     │
     └─→ RMDXModule {
           default: DefaultComponent,
           toc: headings,
           Toc: TocComponent
         }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This custom plugin is needed because:

  • Remark parses unknown tags as raw HTML; we rewrite them so downstream
    MDX/rehype tooling treats them as components (supports self-closing and wrapped content)
  • If there are empty lines inside the components, the remark parsers might incorrectly construct the tree; e.g. content beneath the component might get bundled up. So this plugin cleans the content up to prevent such cases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants