Skip to content

Conversation

mislav
Copy link

@mislav mislav commented Jun 15, 2025

When Test_parser() iterates over the myriad test pages in the suite, if any parsed text for an article doesn't match what's in the expected.html fixture, the test will generate a failure message using DiffPrettyText from diffmatchpatch.

This diff formatter will output the complete text with the additions and removals marked up with ANSI escape sequences for terminal colors green and red. While this is alright for shorter text, some of the test pages are really long, and even when the mismatch is with just one word or one line, the entire text would always be printed together with the test failure, making the overall Test_parser result much harder to scroll through and comprehend.

I was inspired to make this improvement after I tried to fix some bugs I found with go-readability. When I made a change to the parser implementation, sometimes that would affect the text content of the extracted article and Test_parser would start failing, but the output was always way too long and almost impossible to detect the actual cause of the failure.

This change replaces the test failure message with a diff generated in a way that long passages of text are truncated with <...>. Furthermore, this adds diff markers {+ +} for additions and [- -] for removals, the same markers as with word diff mode from git-diff(1). These could make the additions or removals more visible to people who don't see a big difference between colors green and red.

Example test failure, where previously the test output was several terminal viewports long: notice the diff addition {+widely used +} and removal [- or optimizing your document-]

$ go test -timeout 30s -run '^Test_parser$'
--- FAIL: Test_parser (1.65s)
    --- FAIL: Test_parser/koreader-guide (0.04s)
        parser_test.go:75: text content is different:
            node: <html><head></head><body><div id="readability-page-1" class="page"><div> <p> <br/> <img loading="lazy" src="http://fakeh
            text: <...>ts into more flexible EPUB format. Most {+widely used +}applications for converting[- or optimizing your document-] for your mobile reading device are Cali<...>
FAIL
exit status 1
FAIL    github.com/go-shiori/go-readability     1.895s

When Test_parser iterates over the myriad test pages in the suite, if any parsed
text for an article doesn't match what's in the `expected.html` fixture, the test
will generate a failure message using DiffPrettyText from diffmatchpatch.

This diff formatter will output the complete text with the additions and removals
marked up with ANSI escape sequences for terminal colors green and red. While this
is alright for shorter text, some of the test pages are really long, and even
when the mismatch is with just one word or one line, the entire text will always
be printed together with the test failure, making the overall Test_parser result
much harder to scroll through and comprehend.

This change replaces the test failure message with a diff generated in a way that
long passages of text are truncated with `<...>`. Furthermore, this adds diff
markers `{+ +}` for additions and `[- -]` for removals, the same markers as with
word diff mode from git-diff(1).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant