Skip to content

Conversation

@daboross
Copy link

@daboross daboross commented Jun 5, 2021

This lacks support for quote authors, lists, and referencing past used URLs. It's also failing a few other tests that I have not investigated.

I've ran out of steam for adding the remaining bits of support here, but I figure it's mostly complete, so I'm opening this PR in case anyone else wants to use this WIP.

This has support for running the given test cases. Here's a log of the ones currently failing:

Failing test cases
Running test case "*‘bold’"
Running test case "_‘underlined’"
Running test case "-‘strikethrough’"
Running test case "~‘italics’"
Running test case "H‘header’\nH(1)‘header’"
Running test case "H(+1)‘header’"
Running test case "H(-1)‘header’"
Running test case "[http://address]"
Running test case "link[http://address]"
Running test case "link[https://address]"
Running test case "‘multiword link’[http://address]"
Running test case "link[https://address ‘title &text[[[comment]]]’]"
Running test case "link[https://address title [.&.] text[[[comment]]]]"
test failure:
 expected: "<a href=\"https://address\" title=\"title [.&amp;.] text\">link</a>"
   actual: "link[https://address title <a href=\".&amp;.\">.&amp;.</a> text]"
for input: "link[https://address title [.&.] text[[[comment]]]]"
Running test case "‘[[[Scoping rules/]]]Code blocks’[./code-blocks]"
Running test case "‘Versioning with 100%/versions_threshold/\\‘2’ overhead’[./versioning.pq]"
Running test case "‘compares files based on which ~‘lines’ have changed’[http://www.devuxer.com/2014/02/15/why-the-mercurial-zipdoc-extension-fails-for-excel-files/]"
Running test case "text[‘title text’]"
Running test case "[text][‘title text’]"
test failure:
 expected: "[text]<abbr title=\"title text\"></abbr>"
   actual: "<abbr title=\"title text\"><a href=\"text\">text</a></abbr>"
for input: "[text][‘title text’]"
Running test case "Примечание 1: только режимы ‘r’ и ‘w’ поддерживаются на данный момент [‘мои мысли на тему режимов открытия файлов’[./File]]"
Running test case "Примечание 1: только режимы ‘r’ и ‘w’ поддерживаются на данный момент [[‘’]‘мои мысли на тему режимов открытия файлов’[./File]]"
Running test case "[[‘’][[[Справка/]]]Документация по методам доступна на данный момент только ‘на английском’[./../../built-in-types].]"
Running test case "[‘мои мысли на тему режимов открытия файлов’[./File]]"
Running test case "link[http://address][1] ‘the same link’[1]"
test failure:
 expected: "<a href=\"http://address\">link</a>[1] ‘the same link’[1]"
   actual: "<a href=\"1\"><a href=\"http://address\">link</a></a> <a href=\"1\">the same link</a>"
for input: "link[http://address][1] ‘the same link’[1]"
Running test case "[[[comment[[[[sensitive information]]]]]]]"
Running test case "[[[com]ment]]"
Running test case "[[[[comment]]]]"
Running test case "[[[[[com]m]e]n]t]"
Running test case "\n A"
test failure:
 expected: "<br />\n&emsp;A"
   actual: "\n A"
for input: "\n A"
Running test case " A"
test failure:
 expected: "&emsp;A"
   actual: " A"
for input: " A"
Running test case "a\n---=\n"
Running test case "a0‘*‘<non-bold>’’"
test failure:
 expected: "a*‘&lt;non-bold>’"
   actual: "a0‘<b>&lt;non-bold&gt;</b>’"
for input: "a0‘*‘<non-bold>’’"
Running test case "aО‘*‘<non-bold>’’"
test failure:
 expected: "a*‘&lt;non-bold>’"
   actual: "aО‘<b>&lt;non-bold&gt;</b>’"
for input: "aО‘*‘<non-bold>’’"
Running test case "<<‘выравнивание по левому краю’\n>>‘выравнивание по правому краю’\n><‘выравнивание по центру’\n<>‘выравнивание по ширине’"
test failure:
 expected: "<div align=\"left\">выравнивание по левому краю</div>\n<div align=\"right\">выравнивание по правому краю</div>\n<div align=\"center\">выравнивание по центру</div>\n<div align=\"justify\">выравнивание по ширине</div>\n"
   actual: "&lt;&lt;‘выравнивание по левому краю’\n&gt;<blockquote>выравнивание по правому краю</blockquote>\n&gt;&lt;‘выравнивание по центру’\n&lt;<blockquote>выравнивание по ширине</blockquote>"
for input: "<<‘выравнивание по левому краю’\n>>‘выравнивание по правому краю’\n><‘выравнивание по центру’\n<>‘выравнивание по ширине’"
Running test case "‘’<<"
Running test case "/\\‘надстрочный\\superscript’\\/‘подстрочный\\subscript’"
Running test case "> Quote\n>‘Quote2’\n"
Running test case ">[http://address]:‘Quoted text.’"
test failure:
 expected: "<blockquote><a href=\"http://address\"><i>http://address</i></a>:<br />\nQuoted text.</blockquote>"
   actual: "<a href=\"http://address\">&gt;</a>:‘Quoted text.’"
for input: ">[http://address]:‘Quoted text.’"
Running test case ">[http://another-address][-1]:‘Quoted text.’\n>[-1]:‘Another quoted text.’"
test failure:
 expected: "<blockquote><a href=\"http://another-address\"><i>http://another-address</i></a>:<br />\nQuoted text.</blockquote>\n<blockquote>Another quoted text.</blockquote>"
   actual: "<a href=\"-1\"><a href=\"http://another-address\">&gt;</a></a>:‘Quoted text.’<a href=\"-1\">&gt;</a>:‘Another quoted text.’"
for input: ">[http://another-address][-1]:‘Quoted text.’\n>[-1]:‘Another quoted text.’"
Running test case ">‘Author\'s name’[http://address]:‘Quoted text.’"
test failure:
 expected: "<blockquote><i><a href=\"http://address\">Author\'s name</a></i>:<br />\nQuoted text.</blockquote>"
   actual: "<a href=\"http://address\"><blockquote>Author\'s name</blockquote></a>:‘Quoted text.’"
for input: ">‘Author\'s name’[http://address]:‘Quoted text.’"
Running test case ">‘Author\'s name’:‘Quoted text.’"
test failure:
 expected: "<blockquote><i>Author\'s name</i>:<br />\nQuoted text.</blockquote>"
   actual: "<blockquote>Author\'s name</blockquote>:‘Quoted text.’"
for input: ">‘Author\'s name’:‘Quoted text.’"
Running test case "‘Quoted text.’:‘Author\'s name’<"
test failure:
 expected: "<blockquote>Quoted text.<br />\n<div align=\'right\'><i>Author\'s name</i></div></blockquote>"
   actual: "‘Quoted text.’:‘Author\'s name’&lt;"
for input: "‘Quoted text.’:‘Author\'s name’<"
Running test case ">‘Как люди думают. Дмитрий Чернышев. 2015. 304с.’:‘[[[стр.89:]]]...’"
test failure:
 expected: "<blockquote><i>Как люди думают. Дмитрий Чернышев. 2015. 304с.</i>:<br />\n...</blockquote>"
   actual: "<blockquote>Как люди думают. Дмитрий Чернышев. 2015. 304с.</blockquote>:‘...’"
for input: ">‘Как люди думают. Дмитрий Чернышев. 2015. 304с.’:‘[[[стр.89:]]]...’"
Running test case ">‘>‘Автор против nullable-типов?’\nДа. Адрес, указывающий на незаконный участок памяти, сам незаконен.’"
Running test case ">‘> Автор против nullable-типов?\nДа. Адрес, указывающий на незаконный участок памяти, сам незаконен.’"
Running test case "‘понимание [[[процесса]]] разбора [[[разметки]]] человеком’[‘говоря проще: приходится [[[гораздо]]] меньше думать о том, будет это работать или не будет, а просто пишешь в соответствии с чёткими/простыми/логичными правилами, и всё’]"
Running test case ". unordered\n. list"
test failure:
 expected: "• unordered<br />\n• list"
   actual: ". unordered<br />\n. list"
for input: ". unordered\n. list"
Running test case "A\n```\nlet s2 = str\n        .lowercaseString\n        .replace(\"hello\", withString: \"goodbye\")\n```\nB\nC"

This lacks support for quote authors, lists, and referencing past
used URLs. It's also failing a few tests that I have not investigated.

It's otherwise complete.
@alextretyak
Copy link
Member

Great thank you in advance!

This lacks support for ... referencing past used URLs

Please note that pqmarkup-like in contrast to pqmarkup does not support referencing past used URLs. Also they have a slightly different syntax: link[http://address][-1] rather than link[http://address][1] (see tests for pqmarkup). And this test (with link[http://address][1]) is needed to just verify that [1] is ignored/‘not considered as a link’.

@daboross
Copy link
Author

daboross commented Jun 5, 2021

Please note that pqmarkup-like in contrast to pqmarkup does not support referencing past used URLs. Also they have a slightly different syntax: link[http://address][-1] rather than link[http://address][1] (see tests for pqmarkup). And this test (with link[http://address][1]) is needed to just verify that [1] is ignored/‘not considered as a link’.

Ah, ok. Thank you for the clarification!

I've fixed part of this - it will no longer try to produce links that wrap other links. However, the test is still failing and I'm not 100% sure what I should do.

It seems like the success case ignores the second ‘the same link’[1] as well. But what's the difference between this and, for instance, ‘the same link’[http://address]? It seems like both could be valid addresses, so the current rust code generates links for both of them.

test failure:
 expected: "<a href=\"http://address\">link</a>[1] ‘the same link’[1]"
   actual: "<a href=\"http://address\">link</a>[1] <a href=\"1\">the same link</a>"
for input: "link[http://address][1] ‘the same link’[1]"
Updated test results
Running test case "*‘bold’"
Running test case "_‘underlined’"
Running test case "-‘strikethrough’"
Running test case "~‘italics’"
Running test case "H‘header’\nH(1)‘header’"
Running test case "H(+1)‘header’"
Running test case "H(-1)‘header’"
Running test case "[http://address]"
Running test case "link[http://address]"
Running test case "link[https://address]"
Running test case "‘multiword link’[http://address]"
Running test case "link[https://address ‘title &text[[[comment]]]’]"
Running test case "link[https://address title [.&.] text[[[comment]]]]"
test failure:
 expected: "<a href=\"https://address\" title=\"title [.&amp;.] text\">link</a>"
   actual: "link[https://address title <a href=\".&amp;.\">.&amp;.</a> text]"
for input: "link[https://address title [.&.] text[[[comment]]]]"
Running test case "‘[[[Scoping rules/]]]Code blocks’[./code-blocks]"
Running test case "‘Versioning with 100%/versions_threshold/\\‘2’ overhead’[./versioning.pq]"
Running test case "‘compares files based on which ~‘lines’ have changed’[http://www.devuxer.com/2014/02/15/why-the-mercurial-zipdoc-extension-fails-for-excel-files/]"
Running test case "text[‘title text’]"
Running test case "[text][‘title text’]"
test failure:
 expected: "[text]<abbr title=\"title text\"></abbr>"
   actual: "<abbr title=\"title text\"><a href=\"text\">text</a></abbr>"
for input: "[text][‘title text’]"
Running test case "Примечание 1: только режимы ‘r’ и ‘w’ поддерживаются на данный момент [‘мои мысли на тему режимов открытия файлов’[./File]]"
Running test case "Примечание 1: только режимы ‘r’ и ‘w’ поддерживаются на данный момент [[‘’]‘мои мысли на тему режимов открытия файлов’[./File]]"
Running test case "[[‘’][[[Справка/]]]Документация по методам доступна на данный момент только ‘на английском’[./../../built-in-types].]"
Running test case "[‘мои мысли на тему режимов открытия файлов’[./File]]"
Running test case "link[http://address][1] ‘the same link’[1]"
test failure:
 expected: "<a href=\"http://address\">link</a>[1] ‘the same link’[1]"
   actual: "<a href=\"http://address\">link</a>[1] <a href=\"1\">the same link</a>"
for input: "link[http://address][1] ‘the same link’[1]"
Running test case "[[[comment[[[[sensitive information]]]]]]]"
Running test case "[[[com]ment]]"
Running test case "[[[[comment]]]]"
Running test case "[[[[[com]m]e]n]t]"
Running test case "\n A"
test failure:
 expected: "<br />\n&emsp;A"
   actual: "\n A"
for input: "\n A"
Running test case " A"
test failure:
 expected: "&emsp;A"
   actual: " A"
for input: " A"
Running test case "a\n---=\n"
Running test case "a0‘*‘<non-bold>’’"
test failure:
 expected: "a*‘&lt;non-bold>’"
   actual: "a0‘<b>&lt;non-bold&gt;</b>’"
for input: "a0‘*‘<non-bold>’’"
Running test case "aО‘*‘<non-bold>’’"
test failure:
 expected: "a*‘&lt;non-bold>’"
   actual: "aО‘<b>&lt;non-bold&gt;</b>’"
for input: "aО‘*‘<non-bold>’’"
Running test case "<<‘выравнивание по левому краю’\n>>‘выравнивание по правому краю’\n><‘выравнивание по центру’\n<>‘выравнивание по ширине’"
test failure:
 expected: "<div align=\"left\">выравнивание по левому краю</div>\n<div align=\"right\">выравнивание по правому краю</div>\n<div align=\"center\">выравнивание по центру</div>\n<div align=\"justify\">выравнивание по ширине</div>\n"
   actual: "&lt;&lt;‘выравнивание по левому краю’\n&gt;<blockquote>выравнивание по правому краю</blockquote>\n&gt;&lt;‘выравнивание по центру’\n&lt;<blockquote>выравнивание по ширине</blockquote>"
for input: "<<‘выравнивание по левому краю’\n>>‘выравнивание по правому краю’\n><‘выравнивание по центру’\n<>‘выравнивание по ширине’"
Running test case "‘’<<"
Running test case "/\\‘надстрочный\\superscript’\\/‘подстрочный\\subscript’"
Running test case "> Quote\n>‘Quote2’\n"
Running test case ">[http://address]:‘Quoted text.’"
test failure:
 expected: "<blockquote><a href=\"http://address\"><i>http://address</i></a>:<br />\nQuoted text.</blockquote>"
   actual: "<a href=\"http://address\">&gt;</a>:‘Quoted text.’"
for input: ">[http://address]:‘Quoted text.’"
Running test case ">[http://another-address][-1]:‘Quoted text.’\n>[-1]:‘Another quoted text.’"
test failure:
 expected: "<blockquote><a href=\"http://another-address\"><i>http://another-address</i></a>:<br />\nQuoted text.</blockquote>\n<blockquote>Another quoted text.</blockquote>"
   actual: "<a href=\"http://another-address\">&gt;</a>[-1]:‘Quoted text.’<a href=\"-1\">&gt;</a>:‘Another quoted text.’"
for input: ">[http://another-address][-1]:‘Quoted text.’\n>[-1]:‘Another quoted text.’"
Running test case ">‘Author\'s name’[http://address]:‘Quoted text.’"
test failure:
 expected: "<blockquote><i><a href=\"http://address\">Author\'s name</a></i>:<br />\nQuoted text.</blockquote>"
   actual: "<a href=\"http://address\"><blockquote>Author\'s name</blockquote></a>:‘Quoted text.’"
for input: ">‘Author\'s name’[http://address]:‘Quoted text.’"
Running test case ">‘Author\'s name’:‘Quoted text.’"
test failure:
 expected: "<blockquote><i>Author\'s name</i>:<br />\nQuoted text.</blockquote>"
   actual: "<blockquote>Author\'s name</blockquote>:‘Quoted text.’"
for input: ">‘Author\'s name’:‘Quoted text.’"
Running test case "‘Quoted text.’:‘Author\'s name’<"
test failure:
 expected: "<blockquote>Quoted text.<br />\n<div align=\'right\'><i>Author\'s name</i></div></blockquote>"
   actual: "‘Quoted text.’:‘Author\'s name’&lt;"
for input: "‘Quoted text.’:‘Author\'s name’<"
Running test case ">‘Как люди думают. Дмитрий Чернышев. 2015. 304с.’:‘[[[стр.89:]]]...’"
test failure:
 expected: "<blockquote><i>Как люди думают. Дмитрий Чернышев. 2015. 304с.</i>:<br />\n...</blockquote>"
   actual: "<blockquote>Как люди думают. Дмитрий Чернышев. 2015. 304с.</blockquote>:‘...’"
for input: ">‘Как люди думают. Дмитрий Чернышев. 2015. 304с.’:‘[[[стр.89:]]]...’"
Running test case ">‘>‘Автор против nullable-типов?’\nДа. Адрес, указывающий на незаконный участок памяти, сам незаконен.’"
Running test case ">‘> Автор против nullable-типов?\nДа. Адрес, указывающий на незаконный участок памяти, сам незаконен.’"
Running test case "‘понимание [[[процесса]]] разбора [[[разметки]]] человеком’[‘говоря проще: приходится [[[гораздо]]] меньше думать о том, будет это работать или не будет, а просто пишешь в соответствии с чёткими/простыми/логичными правилами, и всё’]"
Running test case ". unordered\n. list"
test failure:
 expected: "• unordered<br />\n• list"
   actual: ". unordered<br />\n. list"
for input: ". unordered\n. list"
Running test case "A\n```\nlet s2 = str\n        .lowercaseString\n        .replace(\"hello\", withString: \"goodbye\")\n```\nB\nC"

@alextretyak
Copy link
Member

alextretyak commented Jun 5, 2021

But what's the difference between this and, for instance, ‘the same link’[http://address]?

Links can be started only with http or ./ (see line #264 of the reference implementation).
Some examples:

  • [http://google.com] — this is a valid link
  • [./relative-address] — and this
  • [www.google.com] — but this is not a link

@alextretyak
Copy link
Member

@daboross
Are you planning to finish your pqmarkup-lite implementation?

@ExpHP
Copy link

ExpHP commented Jun 28, 2021

I've taken a stab at this and will briefly share my experience. The TL;DR though is that, while I want this design to succeed, it gets tricky over time to replicate all of the quirks of the reference implementation simply because they are so different in architecture.


It seems the two biggest missing features are blockquotes-with-sources (>‘...’:‘...’ and ‘...’:‘...’<) and lists. For syntax involving quotes with a single control character, the logic is handled by this function call. For blockquotes-with-sources, I think it probably makes the most sense to add another function that gets called before this which looks at sequences of 4 (?) child nodes.

Before doing that, however, I decided to try an easier feature: The >>‘...’, ><‘...’, etc. ways of controlling text alignment. This is really easy to add, but I got the following test failure (where expected has a \n at the end):

test failure:
   expected: "<div align=\"left\">выравнивание по левому краю</div>\n<div align=\"right\">выравнивание по правому краю</div>\n<div align=\"center\">выравнивание по центру</div>\n<div align=\"justify\">выравнивание по ширине</div>\n"
   actual: "<div align=\"left\">выравнивание по левому краю</div>\n<div align=\"right\">выравнивание по правому краю</div>\n<div align=\"center\">выравнивание по центру</div>\n<div align=\"justify\">выравнивание по ширине</div>"

and the AST looks like this:

Ok(Root([
    Text(""),
    ProcessedPrefixSuffix("<div align=\"left\">", [Text("выравнивание по левому краю")], "</div>"),
    Text("\n"),
    ProcessedPrefixSuffix("<div align=\"right\">", [Text("выравнивание по правому краю")], "</div>"),
    Text("\n"),
    ProcessedPrefixSuffix("<div align=\"center\">", [Text("выравнивание по центру")], "</div>"),
    Text("\n"),
    ProcessedPrefixSuffix("<div align=\"justify\">", [Text("выравнивание по ширине")], "</div>")
]))

There's no newline at the end of actual because the input had no newline there. The newline in the expected is because the original implementation always prints \n after these </div>s and then suppresses newlines from the source text if a </div> has just been written.

Now, something similar to the original behavior could be achieved by emitting "</div>\n" (or a new AST node type which does the same) and then adding a post-processing pass that strips leading \n from Text when the previous child ends with that. But this is a lot of effort to replicate a behavior that (a) was expressed in two lines of the original source and (b) has little impact on meaningful program output on content which is likely rendered with white-space: normal.

In the long term, I fear that I may run into even stranger quirks.

For that reason, I've decided for now to focus on something closer in design to the original implementation.

@alextretyak
Copy link
Member

alextretyak commented Jun 28, 2021

@ExpHP

but I got the following test failure

I think that it is reasonably to fix this test by adding newline at the end of an input. (Here is a fixed tests.txt.)

the original implementation always prints \n after these </div>s

This was just easier to implement.
And current behavior of the Python implementation is a side-effect, and it is not intentional [and because of that I think that tests.txt must be fixed anyway].

and then suppresses newlines from the source text if a </div> has just been written

No, the code you are referencing is intended for notes (where '</div>' is appended to the ending_tags):

elif prevc == '!':
write_to_pos(prevci, i + 1)
outfile.write('<div class="note">')
ending_tags.append('</div>')

And the original implementation does not suppress newlines—it suppresses <br />'s (by setting new_line_tag = '' and then by checking if new_line_tag was set).

@ExpHP
Copy link

ExpHP commented Jul 8, 2021

Whoops, sorry, I just remembered about this. I had a high-priority work-related project come up that took up all of my free time since my last message. That is more or less finished now so I can take another stab at this.

@daboross
Copy link
Author

daboross commented Jul 8, 2021

@daboross
Are you planning to finish your pqmarkup-lite implementation?

I don't have any current plans for this. I left this PR open in case anyone else wanted to finish it, but we could also just close it.

Links can be started only with http or ./ (see line #264 of the reference implementation).
Some examples:

* `[http://google.com]` — this is a valid link

* `[./relative-address]` — and this

* `[www.google.com]` — but this is not a link

This makes sense! I hadn't realized that, and I do mostly disagree with this design choice (I don't think markup languages should be deciding what's a valid link), but it should be fairly easy to fix in the PR.

The other quirks... As @alextretyak noticed, there are a number of behaviors that seem really only replicable with the original design.

This AST-based implementation can work somewhat, but I think it would take a lot of effort to create a clean and nice-looking Rust implementation that also retains compatibility with the original pqmarkup-lite.

@alextretyak
Copy link
Member

alextretyak commented Jul 9, 2021

I just want to say to other/all implementers that I allow fixing tests.txt, and even the original Python implementation if there will be a good reason for that, so a blind compatibility with the original implementation is not strictly necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants