Skip to content

Commit

Permalink
refactor: adjust sanitizer to provide more data to the composer
Browse files Browse the repository at this point in the history
  • Loading branch information
quantizor committed Aug 18, 2024
1 parent 0b5a44a commit 4dc64b7
Show file tree
Hide file tree
Showing 4 changed files with 138 additions and 30 deletions.
21 changes: 20 additions & 1 deletion .changeset/tricky-poems-collect.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,23 @@
'markdown-to-jsx': minor
---

Allow disabling sanitization when `options.sanitization` is explicitly set to `false`.
Allow modifying HTML attribute sanitization when `options.sanitizer` is passed by the composer.

By default a lightweight URL sanitizer function is provided to avoid common attack vectors that might be placed into the `href` of an anchor tag, for example. The sanitizer receives the input, the HTML tag being targeted, the attribute name, and the default sanitizer as a fallback if you only need special handling for certain cases.

This can be overridden and replaced with a custom sanitizer if desired via `options.sanitizer`:

```jsx
// sanitizer in this situation would receive:
// ('javascript:alert("foo")', 'a', 'href', fn)

;<Markdown options={{ sanitizer: (value, tag, attribute, defaultFn) => value }}>
{`[foo](javascript:alert("foo"))`}
</Markdown>

// or

compiler('[foo](javascript:alert("foo"))', {
sanitizer: (value, tag, attribute, defaultFn) => value,
})
```
22 changes: 22 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ The most lightweight, customizable React markdown component.
- [options.createElement - Custom React.createElement behavior](#optionscreateelement---custom-reactcreateelement-behavior)
- [options.enforceAtxHeadings](#optionsenforceatxheadings)
- [options.renderRule](#optionsrenderrule)
- [options.sanitizer](#optionssanitizer)
- [options.slugify](#optionsslugify)
- [options.namedCodesToUnicode](#optionsnamedcodestounicode)
- [options.disableParsingRawHTML](#optionsdisableparsingrawhtml)
Expand Down Expand Up @@ -435,6 +436,27 @@ function App() {
}
````

#### options.sanitizer

By default a lightweight URL sanitizer function is provided to avoid common attack vectors that might be placed into the `href` of an anchor tag, for example. The sanitizer receives the input, the HTML tag being targeted, the attribute name, and the default sanitizer as a fallback if you only need special handling for certain cases.

This can be overridden and replaced with a custom sanitizer if desired via `options.sanitizer`:

```jsx
// sanitizer in this situation would receive:
// ('javascript:alert("foo")', 'a', 'href', fn)

;<Markdown options={{ sanitizer: (value, tag, attribute, defaultFn) => value }}>
{`[foo](javascript:alert("foo"))`}
</Markdown>

// or

compiler('[foo](javascript:alert("foo"))', {
sanitizer: (value, tag, attribute, defaultFn) => value,
})
```

#### options.slugify

By default, a [lightweight deburring function](https://github.com/probablyup/markdown-to-jsx/blob/bc2f57412332dc670f066320c0f38d0252e0f057/index.js#L261-L275) is used to generate an HTML id from headings. You can override this by passing a function to `options.slugify`. This is helpful when you are using non-alphanumeric characters (e.g. Chinese or Japanese characters) in headings. For example:
Expand Down
28 changes: 27 additions & 1 deletion index.compiler.spec.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -1184,7 +1184,7 @@ describe('links', () => {
jest.spyOn(console, 'warn').mockImplementation(() => {})
jest.spyOn(console, 'error').mockImplementation(() => {})

render(compiler('[foo](javascript:doSomethingBad)', { sanitization: false }))
render(compiler('[foo](javascript:doSomethingBad)', { sanitizer: x => x }))

expect(root.innerHTML).toMatchInlineSnapshot(`
<a href="javascript:doSomethingBad">
Expand All @@ -1195,6 +1195,32 @@ describe('links', () => {
expect(console.warn).not.toHaveBeenCalled()
})

it('can conditionally sanitize HTML using options.sanitize', () => {
jest.spyOn(console, 'warn').mockImplementation(() => {})
jest.spyOn(console, 'error').mockImplementation(() => {})

render(
compiler(
'[foo](javascript:doSomethingBad)\n![foo](javascript:doSomethingBad)',
{
sanitizer: (value, tag, attribute, defaultFn) =>
tag === 'a' ? value : defaultFn(value),
}
)
)

expect(root.innerHTML).toMatchInlineSnapshot(`
<p>
<a href="javascript:doSomethingBad">
foo
</a>
<img alt="foo">
</p>
`)

expect(console.warn).toHaveBeenCalledTimes(1)
})

it('should sanitize markdown links containing JS expressions', () => {
jest.spyOn(console, 'warn').mockImplementation(() => {})
jest.spyOn(console, 'error').mockImplementation(() => {})
Expand Down
97 changes: 69 additions & 28 deletions index.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -731,9 +731,10 @@ function normalizeAttributeKey(key) {
}

function attributeValueToJSXPropValue(
tag: MarkdownToJSX.HTMLTags,
key: keyof React.AllHTMLAttributes<Element>,
value: string,
sanitizeUrlFn: (url: string) => string
sanitizeUrlFn: MarkdownToJSX.Options['sanitizer']
): any {
if (key === 'style') {
return value.split(/;\s?/).reduce(function (styles, kvPair) {
Expand All @@ -751,7 +752,7 @@ function attributeValueToJSXPropValue(
return styles
}, {})
} else if (key === 'href' || key === 'src') {
return sanitizeUrlFn(value)
return sanitizeUrlFn(value, tag, key, defaultSanitizeUrl)
} else if (value.match(INTERPOLATION_R)) {
// return as a string and let the consumer decide what to do with it
value = value.slice(1, value.length - 1)
Expand Down Expand Up @@ -952,10 +953,6 @@ function matchParagraph(
return [match, captured]
}

function identity<T>(x: T): T {
return x
}

function defaultSanitizeUrl(url: string): string | undefined {
try {
const decoded = decodeURIComponent(url).replace(/[^A-Za-z0-9/:]/g, '')
Expand Down Expand Up @@ -1142,16 +1139,14 @@ export function compiler(
markdown: string = '',
options: MarkdownToJSX.Options = {}
) {
options.overrides = options.overrides || {}
options.slugify = options.slugify || slugify
options.overrides ||= {}
options.sanitizer ||= defaultSanitizeUrl
options.slugify ||= slugify
options.namedCodesToUnicode = options.namedCodesToUnicode
? { ...namedCodesToUnicode, ...options.namedCodesToUnicode }
: namedCodesToUnicode

// If "sanitization" is not explicitly set to false, it will be enabled by default
let sanitizeUrlFn = options.sanitization !== false ? defaultSanitizeUrl : identity

const createElementFn = options.createElement || React.createElement
options.createElement ||= React.createElement

// JSX custom pragma
// eslint-disable-next-line no-unused-vars
Expand All @@ -1166,7 +1161,7 @@ export function compiler(
) {
const overrideProps = get(options.overrides, `${tag}.props`, {})

return createElementFn(
return options.createElement(
getTag(tag, options.overrides),
{
...props,
Expand Down Expand Up @@ -1236,7 +1231,10 @@ export function compiler(
return React.createElement(wrapper, { key: 'outer' }, jsx)
}

function attrStringToMap(str: string): JSX.IntrinsicAttributes {
function attrStringToMap(
tag: MarkdownToJSX.HTMLTags,
str: string
): JSX.IntrinsicAttributes {
const attributes = str.match(ATTR_EXTRACTOR_R)
if (!attributes) {
return null
Expand All @@ -1251,9 +1249,10 @@ export function compiler(

const mappedKey = ATTRIBUTE_TO_JSX_PROP_MAP[key] || key
const normalizedValue = (map[mappedKey] = attributeValueToJSXPropValue(
tag,
key,
value,
sanitizeUrlFn
options.sanitizer
))

if (
Expand Down Expand Up @@ -1375,7 +1374,7 @@ export function compiler(
parse(capture /*, parse, state*/) {
return {
// if capture[3] it's additional metadata
attrs: attrStringToMap(capture[3] || ''),
attrs: attrStringToMap('code', capture[3] || ''),
lang: capture[2] || undefined,
text: capture[4],
type: RuleType.codeBlock,
Expand Down Expand Up @@ -1424,7 +1423,15 @@ export function compiler(
},
render(node, output, state) {
return (
<a key={state.key} href={sanitizeUrlFn(node.target)}>
<a
key={state.key}
href={options.sanitizer(
node.target,
'a',
'href',
defaultSanitizeUrl
)}
>
<sup key={state.key}>{node.text}</sup>
</a>
)
Expand Down Expand Up @@ -1504,10 +1511,14 @@ export function compiler(
const noInnerParse =
DO_NOT_PROCESS_HTML_ELEMENTS.indexOf(tagName) !== -1

const tag = (
noInnerParse ? tagName : capture[1]
).trim() as MarkdownToJSX.HTMLTags

const ast = {
attrs: attrStringToMap(capture[2]),
attrs: attrStringToMap(tag, capture[2]),
noInnerParse: noInnerParse,
tag: (noInnerParse ? tagName : capture[1]).trim(),
tag,
} as {
attrs: ReturnType<typeof attrStringToMap>
children?: ReturnType<MarkdownToJSX.NestedParser> | undefined
Expand Down Expand Up @@ -1548,9 +1559,11 @@ export function compiler(
match: anyScopeRegex(HTML_SELF_CLOSING_ELEMENT_R),
order: Priority.HIGH,
parse(capture /*, parse, state*/) {
const tag = capture[1].trim() as MarkdownToJSX.HTMLTags

return {
attrs: attrStringToMap(capture[2] || ''),
tag: capture[1].trim(),
attrs: attrStringToMap(tag, capture[2] || ''),
tag,
}
},
render(node, output, state) {
Expand Down Expand Up @@ -1583,7 +1596,12 @@ export function compiler(
key={state.key}
alt={node.alt || undefined}
title={node.title || undefined}
src={sanitizeUrlFn(node.target)}
src={options.sanitizer(
node.target,
'img',
'src',
defaultSanitizeUrl
)}
/>
)
},
Expand All @@ -1605,7 +1623,16 @@ export function compiler(
},
render(node, output, state) {
return (
<a key={state.key} href={sanitizeUrlFn(node.target)} title={node.title}>
<a
key={state.key}
href={options.sanitizer(
node.target,
'a',
'href',
defaultSanitizeUrl
)}
title={node.title}
>
{output(node.children, state)}
</a>
)
Expand Down Expand Up @@ -1734,7 +1761,12 @@ export function compiler(
<img
key={state.key}
alt={node.alt}
src={sanitizeUrlFn(refs[node.ref].target)}
src={options.sanitizer(
refs[node.ref].target,
'img',
'src',
defaultSanitizeUrl
)}
title={refs[node.ref].title}
/>
) : null
Expand All @@ -1758,7 +1790,12 @@ export function compiler(
return refs[node.ref] ? (
<a
key={state.key}
href={sanitizeUrlFn(refs[node.ref].target)}
href={options.sanitizer(
refs[node.ref].target,
'a',
'href',
defaultSanitizeUrl
)}
title={refs[node.ref].title}
>
{output(node.children, state)}
Expand Down Expand Up @@ -2384,11 +2421,15 @@ export namespace MarkdownToJSX {
state: State
) => React.ReactChild


/**
* Whether to enable markdown-to-jsx's built-in sanitization.
* Override the built-in sanitizer function for URLs, etc if desired.
*/
sanitization: boolean
sanitizer: (
value: string,
tag: HTMLTags,
attribute: string,
defaultFn: (string) => string
) => string

/**
* Override normalization of non-URI-safe characters for use in generating
Expand Down

0 comments on commit 4dc64b7

Please sign in to comment.