-
| I noticed that  Here's a test case: import { toMarkdown } from "mdast-util-to-markdown";
const tree = {
  type: "root",
  children: [
    {
      type: "paragraph",
      children: [
        { type: "text", value: "foo" },
        {
          type: "strong",
          children: [{ type: "text", value: " this is bold " }],
        },
        { type: "text", value: "bar" },
      ],
    },
  ],
};
console.log(toMarkdown(tree));Current output:  Expected output:  | 
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 24 replies
-
| An interesting question @gschlager. The content example in mdast shared itself is not valid markdown. const tree = {
  type: "root",
  children: [
    {
      type: "paragraph",
      children: [
        { type: "text", value: "foo" },
        {
          type: "strong",
          children: [{ type: "text", value: " this is bold " }],
        },
        { type: "text", value: "bar" },
      ],
    },
  ],
};adds spaces around  I'm not sure I see the stringifier as being the place to put content validation/content fixing. | 
Beta Was this translation helpful? Give feedback.
-
| Hey again @gschlager! Sooo, it took me hours and hours of thinking but I managed to reproduce your AST in markdown 😅 -> Pretty doubtful that anyone would ever write that though. But even before I figured it out I already become more understanding of the use cases you mention, where folks are working on the ASTs and injecting punctuation/whitespace/whatever in text or adding/removing emphasis nodes. We already use character references in a couple places. When things can be input:  this -> <p> this </p>…we try and output it too: import {toMarkdown} from 'mdast-util-to-markdown'
/** @type {import('mdast').Root} */
const tree = {
  type: 'root',
  children: [
    {type: 'paragraph', children: [{type: 'text', value: ' this '}]}
  ]
}
console.log(toMarkdown(tree))->  this That way, it roundtrips. We can turn anything into a character reference. x * this *
x * this *
x *.this.*
x *.this.*-> <p>x * this *</p>
<p>x <em> this </em></p>
<p>x <em>.this.</em></p>
<p>x <em>.this.</em></p>Emphasis can form based on the kind of character before and after the “run”, which can be whitespace, punctuation, or anything else. A bigger example (though still reduced, because left and right runs are the same and only looking at asterisks): |                         | A (letter inside) | B (punctuation inside) | C (whitespace inside) | D (nothing inside) |
| ----------------------- | ------------- | ------------------ | ----------------- | -------------- |
| 1 (letter outside)      | x*y*z         | x*.*z              | x* *z             | x**z           |
| 2 (punctuation outside) | .*y*.         | .*.*.              | .* *.             | .**.           |
| 3 (whitespace outside)  | x *y* z       | x *.* z            | x * * z           | x ** z         |
| 4 (nothing outside)     | *x*           | *.*                | * *               | **             |-> 
 Inspecting that, we can divide them into two groups: Now given our magic trick: we can turn letters (1 and A) and whitespace (3 and C) into punctuation (2 and B), by turning them into references (which start with  
 Visualizing that: 
 We observe some interesting aspects: 
 In this case though, we only care about list X: we’re looking at an AST  import {toMarkdown} from 'mdast-util-from-markdown'
/** @type {import('mdast').Root} */
const tree = {type: 'root', children: [
  {type: 'paragraph', children: [{type: 'text', value: 'a '},
  {type: 'emphasis', children: []},
  {type: 'text', value: ' b'}]}
]}
console.log(toMarkdown(tree))-> (current) a ** b-> From the interesting aspects above, we found that we can add an encoded whitespace (a normal space, zero-width space? No break space?) inside it: punctuation around:
a.* *.b (space)
a.*​*.b (zwsp)
a.* *.b (nbsp)
whitespace around:
a * * b (space)
a *​* b (zwsp)
a * * b (nbsp)-> whitespace around: It’s not perfect, adding that space, as expressed before. a b? | 
Beta Was this translation helpful? Give feedback.
-
| I stumbled upon this thread trying to figure out how to deal with whitespace in inline HTML elements (created in some wysiwyg editor) when converting it to markdown. I'm running this: ... through  Which doesn't work as markdown: 
 Similarly if  So unlike @gschlager I didn't get it from parsing markdown but from parsing HTML to hast to mdast to markdown, still leading to the same invalid output. I'm trying to figure out if I can somehow "sanitize" the whitespace characters to lift them out of the inline elements to prepare for the markdown transformation but I haven't figured it out yet. | 
Beta Was this translation helpful? Give feedback.
-
| I initially tried to find a solution based on the solution by @danburzo but found out that there were some edge cases in which it also failed. So I wrote my own recursion algorithm that bubbles the space up to the root node and also took care of a number of other possibilities where the translation can fail. If this is a valid solution I would like to contribute this code to the library where the translation is happening. Any help would be welcome const cleanUpSpaces = () => {
  return /** @param {import('hast').Nodes | import('mdast').TopLevelContent} htmlTree */ (htmlTree) => {
    /**
     * @param {import('hast').Parent} node
     * @param {import('hast').Parent | null} parent
     * @param {Number} index
     * @returns
     */
    const visitNode = (node, parent, index) => {
      // text nodes will not have any children property, so this will do an early return for all such tags.
      if (!node.children) {
        return;
      }
      /**
       * if strong, del or em tags doesn't have any children(children array is present,
       * but is an empty array),remove them as it can cause the translation to break.
       * eg: <p><strong></strong></p> -----> <p></p>
       */
      if (node.children.length === 0) {
        parent?.children.splice(index, 1);
        return;
      }
      /**
       * Traversing in reverse order because the visitNode function will change the array in which the node is present,
       * and in the case of leading spaces, the space gets extracted and inserted at the index of the node and the node gets displaced
       * into the next place, so in the case of forward iteration, this node is again processed.
       *
       * Also in the case when one node is deleted from the array, the next node will take it's place, and in the case
       * of forward iteration, this sibling node can get omitted.
       */
      for (let i = node.children.length - 1; i >= 0; i--) {
        const child = node.children[i];
        visitNode(child, node, i);
      }
      if (node.type !== 'strong' && node.type !== 'emphasis' && node.type !== 'delete') {
        return;
      }
      const firstChild = node.children[0];
      const lastChild = node.children[node.children.length - 1];
      if (firstChild.type === 'text') {
        /**
         * Looking for leading spaces:
         * <p><strong><text> Hello</text></strong></p> -----> <p><text> </text><strong><text>Hello</text></strong></p>
         */
        const [, leadingSpaces, textValue] = firstChild.value.match(/^(\s+)(.*?)$/) || [];
        if (leadingSpaces && leadingSpaces.length > 0) {
          firstChild.value = textValue;
          parent?.children.splice(index, 0, {
            type: 'text',
            value: leadingSpaces,
          });
          // index is incremented because the space was inserted in the place of the node, so the index of the node will change.
          index += 1;
        }
      }
      if (lastChild.type === 'text') {
        /**
         * Looking for trailing spaces:
         * <p><strong><text>Hello </text></strong></p> -----> <p><strong><text>Hello</text></strong><text> </text></p>
         */
        const [, textValue, trailingSpaces] = lastChild.value.match(/^(.*?)(\s+)$/) || [];
        if (trailingSpaces && trailingSpaces.length > 0) {
          lastChild.value = textValue;
          parent?.children.splice(index + 1, 0, {
            type: 'text',
            value: trailingSpaces,
          });
        }
      }
      /**
       * whenever a strong/del/em node has a single text node as a child and the value of the text node is an empty string,
       * remove the node.
       * eg: <p><strong><text></text></strong></p> ----> <p></p>; This can happen when there was a whitespace in the text
       *     node, which was identified as a leading space and was extracted outside.
       */
      if (firstChild === lastChild && firstChild.type === 'text' && firstChild.value === '') {
        parent?.children.splice(index, 1);
      }
      return;
    };
    visitNode(htmlTree, null, 0);
  };
}; | 
Beta Was this translation helpful? Give feedback.
-
| Hi @wooorm , version 2.1.1 or commit syntax-tree/mdast-util-to-markdown@97fb8181 has a breaking change on Markdown round trip. | 
Beta Was this translation helpful? Give feedback.
-
| Oh my god, thank you so much @wooorm 💜 | 
Beta Was this translation helpful? Give feedback.
Hey again @gschlager! Sooo, it took me hours and hours of thinking but I managed to reproduce your AST in markdown 😅
->
foo this is bold bar
Pretty doubtful that anyone would ever write that though. But even before I figured it out I already become more understanding of the use cases you mention, where folks are working on the ASTs and injecting punctuation/whitespace/whatever in text or adding/removing emphasis nodes.
We already use character references in a couple places. When things can be input:
->
…we try and output it too: