Test_parser: make failures for text content mismatches more readable #96
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When
Test_parser()
iterates over the myriad test pages in the suite, if any parsed text for an article doesn't match what's in theexpected.html
fixture, the test will generate a failure message using DiffPrettyText from diffmatchpatch.This diff formatter will output the complete text with the additions and removals marked up with ANSI escape sequences for terminal colors green and red. While this is alright for shorter text, some of the test pages are really long, and even when the mismatch is with just one word or one line, the entire text would always be printed together with the test failure, making the overall Test_parser result much harder to scroll through and comprehend.
I was inspired to make this improvement after I tried to fix some bugs I found with go-readability. When I made a change to the parser implementation, sometimes that would affect the text content of the extracted article and
Test_parser
would start failing, but the output was always way too long and almost impossible to detect the actual cause of the failure.This change replaces the test failure message with a diff generated in a way that long passages of text are truncated with
<...>
. Furthermore, this adds diff markers{+ +}
for additions and[- -]
for removals, the same markers as with word diff mode from git-diff(1). These could make the additions or removals more visible to people who don't see a big difference between colors green and red.Example test failure, where previously the test output was several terminal viewports long: notice the diff addition
{+widely used +}
and removal[- or optimizing your document-]