Skip to content

Commit

Permalink
fix #207
Browse files Browse the repository at this point in the history
Refactored the block syntax detection logic to use the actual
detector regexes themselves for easier maintenance, rather than
hacking together one regex that tries to do everything.
  • Loading branch information
quantizor committed Aug 20, 2018
1 parent ad1fab9 commit cc3397c
Show file tree
Hide file tree
Showing 4 changed files with 178 additions and 58 deletions.
77 changes: 36 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,28 +7,27 @@ The most lightweight, customizable React markdown component.

<!-- TOC -->

- [Installation](#installation)
- [Usage](#usage)
- [Parsing Options](#parsing-options)
- [options.forceBlock](#optionsforceblock)
- [options.forceInline](#optionsforceinline)
- [options.overrides - Override Any HTML Tag's Representation](#optionsoverrides---override-any-html-tags-representation)
- [options.overrides - Rendering Arbitrary React Components](#optionsoverrides---rendering-arbitrary-react-components)
- [options.createElement - Custom React.createElement behavior](#optionscreateelement---custom-reactcreateelement-behavior)
- [options.slugify](#optionsslugify)
- [Getting the smallest possible bundle size](#getting-the-smallest-possible-bundle-size)
- [Usage with Preact](#usage-with-preact)
- [Gotchas](#gotchas)
- [Significant indentation inside arbitrary HTML](#significant-indentation-inside-arbitrary-html)
- [Code blocks](#code-blocks)
- [Nested lists](#nested-lists)
- [Using The Compiler Directly](#using-the-compiler-directly)
- [Changelog](#changelog)
- [Donate](#donate)
- [Credits](#credits)
- [Contributors](#contributors)
- [Backers](#backers)
- [Sponsors](#sponsors)
- [Installation](#installation)
- [Usage](#usage)
- [Parsing Options](#parsing-options)
- [options.forceBlock](#optionsforceblock)
- [options.forceInline](#optionsforceinline)
- [options.overrides - Override Any HTML Tag's Representation](#optionsoverrides---override-any-html-tags-representation)
- [options.overrides - Rendering Arbitrary React Components](#optionsoverrides---rendering-arbitrary-react-components)
- [options.createElement - Custom React.createElement behavior](#optionscreateelement---custom-reactcreateelement-behavior)
- [options.slugify](#optionsslugify)
- [Getting the smallest possible bundle size](#getting-the-smallest-possible-bundle-size)
- [Usage with Preact](#usage-with-preact)
- [Gotchas](#gotchas)
- [Significant indentation inside arbitrary HTML](#significant-indentation-inside-arbitrary-html)
- [Code blocks](#code-blocks)
- [Using The Compiler Directly](#using-the-compiler-directly)
- [Changelog](#changelog)
- [Donate](#donate)
- [Credits](#credits)
- [Contributors](#contributors)
- [Backers](#backers)
- [Sponsors](#sponsors)

<!-- /TOC -->

Expand Down Expand Up @@ -388,22 +387,22 @@ People usually write HTML like this:

Note the leading spaces before the inner content. This sort of thing unfortunately clashes with existing markdown syntaxes since 4 spaces === a code block and other similar collisions.

To get around this, `markdown-to-jsx` strips leading and trailing whitespace inside of arbitrary HTML within markdown. This means that certain syntaxes that use significant whitespace won't work in this edge case.
To get around this, `markdown-to-jsx` left-trims approximately as much whitespace as the first line inside the HTML block. So for example:

> NOTE! These syntaxes work just fine when you aren't writing arbitrary HTML wrappers inside your markdown. This is very much an edge case of an edge case. 🙃
#### Code blocks


```md
```html
<div>
`​`​`js
var some = code();
`​`​`
# Hello

How are you?
</div>
```

The two leading spaces in front of "# Hello" would be left-trimmed from all lines inside the HTML block. In the event that there are varying amounts of indentation, only the amount of the first line is trimmed.

> NOTE! These syntaxes work just fine when you aren't writing arbitrary HTML wrappers inside your markdown. This is very much an edge case of an edge case. 🙃
#### Code blocks

⛔️

```md
Expand All @@ -412,17 +411,13 @@ var some = code();
</div>
```

#### Nested lists

This won't work at all at the moment. Trying to figure out a solution that will coexist peacefully with all the syntax permutations.

⛔️

```md
<div>
* something
* something related
* something else
```js
var some = code();
```
</div>
```
Expand Down
66 changes: 66 additions & 0 deletions __snapshots__/index.spec.js.snap
Original file line number Diff line number Diff line change
Expand Up @@ -338,6 +338,38 @@ exports[`markdown-to-jsx compiler arbitrary HTML #185 misc regression test 1`] =
`;
exports[`markdown-to-jsx compiler arbitrary HTML #207 handles tables inside HTML 1`] = `
<details data-reactroot>
<summary>
Click here
</summary>
<table>
<thead>
<tr>
<th scope="col">
Heading 1
</th>
<th scope="col">
Heading 2
</th>
</tr>
</thead>
<tbody>
<tr>
<td>
Foo
</td>
<td>
Bar
</td>
</tr>
</tbody>
</table>
</details>
`;
exports[`markdown-to-jsx compiler arbitrary HTML allows whitespace between attribute and value 1`] = `
<div data-reactroot
Expand Down Expand Up @@ -564,6 +596,40 @@ exports[`markdown-to-jsx compiler arbitrary HTML handles svg 1`] = `
`;
exports[`markdown-to-jsx compiler arbitrary HTML multiline left-trims by the same amount as the first line 1`] = `
<div data-reactroot>
<pre>
<code class="lang-kotlin">
fun main() {
print("Hello world")
}
</code>
</pre>
</div>
`;
exports[`markdown-to-jsx compiler arbitrary HTML nested lists work inside html 1`] = `
<div data-reactroot>
<ul>
<li>
hi
</li>
<li>
hello
<ul>
<li>
how are you?
</li>
</ul>
</li>
</ul>
</div>
`;
exports[`markdown-to-jsx compiler arbitrary HTML preserves the HTML given 1`] = `
<dd data-reactroot>
Expand Down
45 changes: 28 additions & 17 deletions index.js
Original file line number Diff line number Diff line change
Expand Up @@ -97,17 +97,16 @@ const BLOCKQUOTE_R = /^( *>[^\n]+(\n[^\n]+)*\n*)+\n{2,}/;
const BLOCKQUOTE_TRIM_LEFT_MULTILINE_R = /^ *> ?/gm;
const BREAK_LINE_R = /^ {2,}\n/;
const BREAK_THEMATIC_R = /^(?:( *[-*_]) *){3,}(?:\n *)+\n/;
const CODE_BLOCK_FENCED_R = /^\s*(`{3,}|~{3,}) *(\S+)? *\n([\s\S]+?)\s*\1 *(?:\n *)+\n/;
const CODE_BLOCK_R = /^(?: {4}[^\n]+\n*)+(?:\n *)+\n/;
const CODE_BLOCK_FENCED_R = /^\s*(`{3,}|~{3,}) *(\S+)? *\n([\s\S]+?)\s*\1 *(?:\n *)+\n?/;
const CODE_BLOCK_R = /^(?: {4}[^\n]+\n*)+(?:\n *)+\n?/;
const CODE_INLINE_R = /^(`+)\s*([\s\S]*?[^`])\s*\1(?!`)/;
const CONSECUTIVE_NEWLINE_R = /^(?:\n *)*\n/;
const CR_NEWLINE_R = /\r\n?/g;
const DETECT_BLOCK_SYNTAX = /(^[-*] |^#+ ?\w|^ {2,}|^-{2,}|^> |^`{3})/m;
const FOOTNOTE_R = /^\[\^(.*)\](:.*)\n/;
const FOOTNOTE_REFERENCE_R = /^\[\^(.*)\]/;
const FORMFEED_R = /\f/g;
const GFM_TASK_R = /^\s*?\[(x|\s)\]/;
const HEADING_R = /^ *(#{1,6}) *([^\n]+?) *#* *\n+/;
const HEADING_R = /^ *(#{1,6}) *([^\n]+)\n{0,2}/;
const HEADING_SETEXT_R = /^([^\n]+)\n *(=|-){3,} *(?:\n *)+\n/;

/**
Expand All @@ -132,7 +131,7 @@ const HEADING_SETEXT_R = /^([^\n]+)\n *(=|-){3,} *(?:\n *)+\n/;
* 6. Capture excess newlines afterward
* \n*
*/
const HTML_BLOCK_ELEMENT_R = /^ *<([A-Za-z][^ >/]*) ?([^>]*)\/{0}>\s*((?:<\1[^>]*?>[\s\S]*?<\/\1>|(?!<\1)[\s\S])*?)<\/\1>\n*/;
const HTML_BLOCK_ELEMENT_R = /^ *<([A-Za-z][^ >/]*) ?([^>]*)\/{0}>\n?(\s*(?:<\1[^>]*?>[\s\S]*?<\/\1>|(?!<\1)[\s\S])*?)<\/\1>\n*/;

const HTML_COMMENT_R = /^<!--.*?-->/;

Expand Down Expand Up @@ -192,16 +191,7 @@ const TEXT_ESCAPED_R = /^\\([^0-9A-Za-z\s])/;
const TEXT_PLAIN_R = /^[\s\S]+?(?=[^0-9A-Z\s\u00c0-\uffff]|\d+\.|\n\n| {2,}\n|\w+:\S|$)/i;
const TRIM_NEWLINES_AND_TRAILING_WHITESPACE_R = /(^\n+|(\n|\s)+$)/g;

/**
* Indentation-significant syntaxes cannot be used inside arbitrary HTML at this time because
* it's not clear if the indentation is intentional or just there from how the composer
* laid things out.
*
* For code blocks, use fenced blocks instead (```).
*
* There's more detail on this in the README.
*/
const TRIM_HTML = /^[ \t]*|[ \t]*$/gm;
const HTML_LEFT_TRIM_AMOUNT_R = /^(\s*)/

const UNESCAPE_URL_R = /\\([^0-9A-Z\s])/gi;

Expand Down Expand Up @@ -256,6 +246,24 @@ const IMAGE_R = new RegExp(
'^!\\[(' + LINK_INSIDE + ')\\]\\(' + LINK_HREF_AND_TITLE + '\\)'
);

const BLOCK_SYNTAXES = [
BLOCKQUOTE_R,
CODE_BLOCK_R,
CODE_BLOCK_FENCED_R,
HEADING_R,
HEADING_SETEXT_R,
HTML_BLOCK_ELEMENT_R,
HTML_COMMENT_R,
HTML_SELF_CLOSING_ELEMENT_R,
LIST_ITEM_R,
LIST_R,
NP_TABLE_R,
];

function containsBlockSyntax (input) {
return BLOCK_SYNTAXES.some(r => r.test(input))
}

// based on https://stackoverflow.com/a/18123682/1141611
// not complete, but probably good enough
function slugify(str) {
Expand Down Expand Up @@ -1026,8 +1034,11 @@ export function compiler(markdown, options) {
match: anyScopeRegex(HTML_BLOCK_ELEMENT_R),
order: PARSE_PRIORITY_HIGH,
parse(capture, parse, state) {
const trimmed = capture[3].replace(TRIM_HTML, '');
const parseFunc = DETECT_BLOCK_SYNTAX.test(trimmed)
const [, whitespace] = capture[3].match(HTML_LEFT_TRIM_AMOUNT_R)
const trimmer = new RegExp(`^${whitespace}`, 'gm')
const trimmed = capture[3].replace(trimmer, '');

const parseFunc = containsBlockSyntax(trimmed)
? parseBlock
: parseInline;

Expand Down
48 changes: 48 additions & 0 deletions index.spec.js
Original file line number Diff line number Diff line change
Expand Up @@ -970,6 +970,24 @@ $25
expect(root.innerHTML).toMatchSnapshot();
});

it('#207 handles tables inside HTML', () => {
render(
compiler(`
<details>
<summary>Click here</summary>
| Heading 1 | Heading 2 |
| --------- | --------- |
| Foo | Bar |
</details>
`)
)

// expect('').toMatchSnapshot();
expect(root.innerHTML).toMatchSnapshot();
});

it('#185 misc regression test', () => {
render(
compiler(
Expand All @@ -995,6 +1013,36 @@ Text content

expect(root.innerHTML).toMatchSnapshot();
});

it('multiline left-trims by the same amount as the first line', () => {
render(
compiler(`
<div>
\`\`\`kotlin
fun main() {
print("Hello world")
}
\`\`\`
</div>
`)
);

expect(root.innerHTML).toMatchSnapshot();
});

it('nested lists work inside html', () => {
render(
compiler(`
<div>
* hi
* hello
* how are you?
</div>
`)
);

expect(root.innerHTML).toMatchSnapshot();
});
});

describe('horizontal rules', () => {
Expand Down

0 comments on commit cc3397c

Please sign in to comment.