Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat punctuation as part of a word for softwrap. #12471

Closed
wants to merge 1 commit into from

Conversation

rhogenson
Copy link
Contributor

In addition to the examples in the tests, wrapping at punctuation tends to break URLs across lines which makes them more difficult to copy-paste. For code, the result is more subjective, so I will show a few examples here

Before:
before1

After:
after1

Before:
before2

After:
after2

Fixes #11428

In addition to the examples in the tests, wrapping at punctuation tends
to break URLs across lines which makes them more difficult to
copy-paste. For code it tends to give better results as well, except in
cases where max_wrap is very small.
@darshanCommits
Copy link
Contributor

I never knew I needed this. thanks

@nik-rev
Copy link
Contributor

nik-rev commented Jan 10, 2025

Wow, thank you for this! Absolutely 100% agree with darshanCommits

I think the code is easier to read now!

@kirawi kirawi added the A-core Area: Helix core improvements label Jan 19, 2025
@pascalkuthe
Copy link
Member

pascalkuthe commented Feb 1, 2025

I am not a big fan of this one. For the first example I prefer the before case (wrapping at the bracket is pertty standard and even the standard formatting style for many languages). I explicitly wanted to wrap at brackets, commas, periods, :: and other path seperators.

For example if you have `foo.bar.baz(foo2.bar2.baz2) it would treat that as one big word and therefore wrap this at any random position where I think wrapping at the periods or brackets would be better.

Same for URLs wrappings https://www.google.com as [https://www.][google.com] seems way better than [https://www.go][ogle.com]

This is relevant in cases where a word is too long to be wrapped (which is a setting to be fair but I am not a fan of wrapping too long words in general). One way to resolve this by making punction weak word boundaries that only get used if wrapping the full word would be longer than some configurable value X.

The second change I would like to see is that trailing whitespace and punctuation still belong to the previous word (instead of forming their own word essentially)

@rhogenson
Copy link
Contributor Author

Thanks for the review. Personally, I'm also not convinced that the new results are always better. However, my real motivation for this change is #11738, where this same code is used for hard-wrap. Hard-wrapping a URL across lines is clearly a bug since it breaks the ability to copy-paste it (for soft-wrapping I agree it can look better to break at . or /).

If we want to keep breaking at punctuation for soft-wrap, it will probably need to be configurable so that the hard-wrap code paths can avoid it.

I'll close this PR, but I'm interested in making more improvements to text wrapping so I'll try investigating the other improvements you mentioned: treating punctuation as part of the previous word, and using punctuation as a "weak" word boundary that is less preferred to breaking at whitespace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-core Area: Helix core improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Soft wrapping does a line break in the middle of ".,"
5 participants