A CommonMark-compliant fast and extensible Markdown parser and formatter for Elixir.
Compliant with CommonMark and GitHub Flavored Markdown specifications with extra extensions as Wiki Links, Discord Markdown tags, and emoji. Also supports syntax highlighting out-of-the-box using the Autumn library.
Under the hood it's calling the comrak APIs to process Markdown, a fast Rust crate that ports the cmark fork maintained by GitHub, a widely and well adopted Markdown implementation.
Check out some samples at https://mdex-c31.pages.dev
Add :mdex
dependency:
def deps do
[
{:mdex, "~> 0.1"}
]
end
Mix.install([{:mdex, "~> 0.1"}])
MDEx.to_html!("# Hello")
"<h1>Hello</h1>\n"
MDEx.to_html!("# Hello :smile:", extension: [shortcodes: true])
"<h1>Hello π</h1>\n"
Converts Markdown to an AST data structure that can be inspected and manipulated to change the content of the document.
The data structure shape is exactly the same as the one used by Floki so we can reuse the same APIs and keep the same mental model when working with these documents, either Markdown or HTML, where each node is represented as:
{name, attributes, children}
Example:
MDEx.parse_document!("# Hello")
[{"document", [], [{"heading", [{"level", 1}, {"setext", false}], ["Hello"]}]}]
Note that text nodes have no attributes nor children, so it's represented as a string inside a list.
You can find the full AST spec on the MDEx module types section.
Converts the AST to a human-readable document, most commonly to HTML, example:
MDEx.to_html!([{"document", [], [{"heading", [{"level", 1}, {"setext", false}], ["Hello"]}]}])
"<h1>Hello</h1>\n"
More formats can be added in the future through plugins.
Any missing attribute will be filled with the default value, and extra attributes will be ignored. So you could have the same result with:
MDEx.to_html!([{"document", [], [{"heading", [], ["Hello"]}]}])
"<h1>Hello</h1>\n"
Default values are defined on a best-case scenario but as a good practice you should provide all attributes for each node.
Trying to format malformed ASTs will return a {:error, %DecodeError{}}
describing what and where the error occurred, example:
{:error, decode_error} = MDEx.to_html([{"code", [{1, "foo"}], []}], [])
{:error,
%MDEx.DecodeError{
reason: :attr_key_not_string,
found: "1",
node: "(<<\"code\">>, [{1,<<\"foo\">>}], [])",
attr: "(1, <<\"foo\">>)",
kind: "Integer"
}}
decode_error |> Exception.message() |> IO.puts()
# invalid attribute key
#
# Expected an attribute key encoded as UTF-8 binary
#
# Got:
#
# 1
#
# Type:
#
# Integer
#
# In this node:
#
# (<<"code">>, [{1,<<"foo">>}], [])
#
# In this attribute:
#
# (1, <<"foo">>)
You can enable extensions and change the output of the generated Markdown by passing any of the available Comrak Options
as keyword lists or also an additional :features
option.
The full documentation and list of all options with description and examples can be found on the links below:
:extension
- https://docs.rs/comrak/latest/comrak/struct.ExtensionOptions.html:parse
- https://docs.rs/comrak/latest/comrak/struct.ParseOptions.html:render
- https://docs.rs/comrak/latest/comrak/struct.RenderOptions.html:features
- see the available options below
:sanitize
(defaultfalse
) - sanitize output using ammonia.\n Recommended if passingrender: [unsafe_: true]
:syntax_highlight_theme
(default"onedark"
) - syntax highlight code fences using autumn themes, you should pass the filename without special chars and without extension, for example you should passsyntax_highlight_theme: "adwaita_dark"
to use the Adwaita Dark theme:syntax_highlight_inline_style
(defaulttrue
) - embed styles in the output for each generated token. You'll need to serve CSS themes if inline styles are disabled to properly highlight code
See some examples below on how to use the provided options:
GitHub Flavored Markdown with emojis
MDEx.to_html!(
~S"""
# GitHub Flavored Markdown :rocket:
- [x] Task A
- [x] Task B
- [ ] Task C
| Feature | Status |
| ------- | ------ |
| Fast | :white_check_mark: |
| GFM | :white_check_mark: |
Check out the spec at https://github.github.com/gfm/
""",
extension: [
strikethrough: true,
tagfilter: true,
table: true,
autolink: true,
tasklist: true,
footnotes: true,
shortcodes: true,
],
parse: [
smart: true,
relaxed_tasklist_matching: true,
relaxed_autolinks: true
],
render: [
github_pre_lang: true,
escape: true
]
) |> IO.puts()
# <p>GitHub Flavored Markdown π</p>
# <ul>
# <li><input type="checkbox" checked="" disabled="" /> Task A</li>
# <li><input type="checkbox" checked="" disabled="" /> Task B</li>
# <li><input type="checkbox" disabled="" /> Task C</li>
# </ul>
# <table>
# <thead>
# <tr>
# <th>Feature</th>
# <th>Status</th>
# </tr>
# </thead>
# <tbody>
# <tr>
# <td>Fast</td>
# <td>β
</td>
# </tr>
# <tr>
# <td>GFM</td>
# <td>β
</td>
# </tr>
# </tbody>
# </table>
# <p>Check out the spec at <a href="https://github.github.com/gfm/">https://github.github.com/gfm/</a></p>
MDEx.to_html!(~S"""
```elixir
String.upcase("elixir")
```
""",
features: [syntax_highlight_theme: "catppuccin_latte"]
) |> IO.puts()
# <pre class=\"autumn highlight\" style=\"background-color: #282C34; color: #ABB2BF;\">
# <code class=\"language-elixir\" translate=\"no\">
# <span class=\"namespace\" style=\"color: #61AFEF;\">String</span><span class=\"operator\" style=\"color: #C678DD;\">.</span><span class=\"function\" style=\"color: #61AFEF;\">upcase</span><span class=\"\" style=\"color: #ABB2BF;\">(</span><span class=\"string\" style=\"color: #98C379;\">"elixir"</span><span class=\"\" style=\"color: #ABB2BF;\">)</span>
# </code>
# </pre>
A livebook and a script are available to play with and experiment with this library, or you can check out all available samples at https://mdex-c31.pages.dev
Are you using MDEx and want to list your project here? Please send a PR!
A simple script is available to compare existing libs:
Name ips average deviation median 99th %
cmark 22.82 K 0.0438 ms Β±16.24% 0.0429 ms 0.0598 ms
mdex 3.57 K 0.28 ms Β±9.79% 0.28 ms 0.33 ms
md 0.34 K 2.95 ms Β±10.56% 2.90 ms 3.62 ms
earmark 0.25 K 4.04 ms Β±4.50% 4.00 ms 4.44 ms
Comparison:
cmark 22.82 K
mdex 3.57 K - 6.39x slower +0.24 ms
md 0.34 K - 67.25x slower +2.90 ms
earmark 0.25 K - 92.19x slower +4.00 ms
MDEx was born out of the necessity of parsing CommonMark files, to parse hundreds of files quickly, and to be easily extensible. None of the existing libraries described below met all these requirements at the time.
- earmark is extensible but can't parse all kinds of documents and is slow to convert hundreds of markdowns.
- md is very extensible but the doc says "If one needs to perfectly parse the common markdown, Md is probably not the correct choice" which is probably the cause for failing to parse many documents.
- markdown is not precompiled and has not received updates in a while.
- cmark is a fast CommonMark parser but it requires compiling the C library, is hard to extend, and was archieved on Apr 2024
Note that MDEx is the only one that syntax highlights out-of-the-box which contributes to make it slower than cmark.
To finish, a friendly reminder that all libs have their own strengths and trade-offs so use the one that better suit your needs.
At DockYard we are ready to help you build your next Elixir project. We have a unique expertise in Elixir and Phoenix development that is unmatched and we love to write about Elixir.
Have a project in mind? Get in touch!
- comrak crate for all the heavy work on parsing Markdown and rendering HTML
- Floki for the AST manipulation
- Logo created by Freepik - Flaticon
- Logo font designed by Alan Greene