Skip to content

Use @snippet javadoc tag for snippets #14257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jpountz opened this issue Feb 18, 2025 · 31 comments
Open

Use @snippet javadoc tag for snippets #14257

jpountz opened this issue Feb 18, 2025 · 31 comments

Comments

@jpountz
Copy link
Contributor

jpountz commented Feb 18, 2025

Description

Now that main requires Java 21, we can start using the @snippet javadoc tag, which is quite more convenient to use than the <pre class="prettyprint"></pre> HTML tags we are currently using.

One problem I noticed is that gradlew tidy messes up with formatting when @snippet tags are used.

@dweiss
Copy link
Contributor

dweiss commented Feb 18, 2025

google/google-java-format#789
google/google-java-format#886

Unfortunately, I think these make this issue a no-go to unless you want to resign from using google java formatter... which I don't think we should do.

@rmuir
Copy link
Member

rmuir commented Feb 18, 2025

Related: if we bump main to java 23+ (#14229) then we could do snippets in javadoc with the typical markdown mechanisms.

@rmuir
Copy link
Member

rmuir commented Feb 19, 2025

The markdown would have a huge advantage for the many analyzers with xml-style prettyprint docs today. I don't think @snippet works with languages other than java.

For me, editor displays the current "prettyprint" of analyzers XML docs without any syntax highlighting:

Image

On the other hand, with markdown you'd get this:

<fieldType name="text_arnormal" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.ArabicNormalizationFilterFactory"/>
  </analyzer>
</fieldType>

@rmuir
Copy link
Member

rmuir commented Mar 25, 2025

@dweiss what do you think about exploring the option to swap out spotless java.GoogleJavaFormatStep with java.EclipseJdtFormatterStep but using google-java format config: https://github.com/google/styleguide/blob/gh-pages/eclipse-java-google-style.xml

I know the history that google says nobody else's formatter is good enough, they say you need "their" formatter for the google style and that others eclipse/intellij can't do it right, but their formatter can't support @snippet nor markdown comments (it will mess them up), so they fail technically, they don't keep up with java standard.

@rmuir
Copy link
Member

rmuir commented Mar 25, 2025

The Palantir one is no better off from what I can tell, it is a fork of the google one with some cosmetic tweaks. I know eclipse JDT doesn't choke on these constructs, although I don't have much experience with their formatter, their compiler has been pretty good and they are generally responsive about new java features.

@dweiss
Copy link
Contributor

dweiss commented Mar 25, 2025

I have been using google's tool mostly because it's been fairly solid in terms of formatting between releases (so no surprise changes on updates). I also like (or have slowly grown accustomed to) what they do with the indents and more complex code blocks. There are quirks, sure, but they'll be present with any tool.

The beauty of having a single standard should be that the tool used to format the code is irrelevant as long, as it ends up with the same output. If we can make eclipse do the same thing (or even better, format snippets reliably) then it's a transparent change for the better - I don't have a problem with that at all.

dweiss added a commit to dweiss/lucene that referenced this issue Mar 25, 2025
@dweiss
Copy link
Contributor

dweiss commented Mar 25, 2025

It's close but there are many differences. See the commit above and try running it on one of the modules, for example suggest (./gradlew -p lucene/suggest spotlessApply).

Maybe it can be made closer, I haven't tried.

@rmuir
Copy link
Member

rmuir commented Mar 25, 2025

@dweiss thank you so much for that starter commit for evaluation. I will try it tonight and fire up eclipse and see what our options are.

I finally finished parser (https://github.com/rmuir/tree-sitter-javadoc) and am now looking at options to (ab)use it, to convert our docs to markdown automatically, when I discovered that google-java-format will mess up markdown comments, e.g. too-long /// will wrap to another line with // and break everything.

So currently, there is no way to escape from the prettier because neither @snippet nor markdown can work with the formatter, so no way to make a patch that passes build.

@dweiss
Copy link
Contributor

dweiss commented Mar 25, 2025

Nice!

So... google java format has this option, at least in the cmd line version:
Image

If nothing else works, we could just make a multipass and format javadocs using a different tool than
the rest of the code... Or fork gjf and implement proper javadoc formatting, which should be a fun project to work on.

@rmuir
Copy link
Member

rmuir commented Mar 25, 2025

I think in the markdown case, the bug I saw was that it didn't treat /// as javadoc but as an ordinary inline comment. But I can experiment with the option still.

@dweiss
Copy link
Contributor

dweiss commented Mar 25, 2025

google/google-java-format#1193

Disabling Javadoc formatting doesn't prevent either issue.

So it seems it's broken entirely. Argh.

@rmuir
Copy link
Member

rmuir commented Mar 26, 2025

I played with this a bit and reduced noise in two ways:

Original file:

113 files changed, 3656 insertions(+), 5216 deletions(-)

  1. Disable reformatting of Apache License header:
<setting id="org.eclipse.jdt.core.formatter.comment.format_header" value="false"/>

93 files changed, 2525 insertions(+), 3861 deletions(-)

  1. Disable reformatting of javadocs comments
    We could fight it, but my goal is to move them all to markdown format which google doesn't support anyway. Most differences are in things such as indentation of html tags.
    So the differences are just not relevant and create noise when doing comparison:
<setting id="org.eclipse.jdt.core.formatter.comment.format_javadoc_comments" value="false"/>

75 files changed, 1922 insertions(+), 3380 deletions(-)

Eclipse has a lot of options and I think this file is just out of date. With the unnecessary noise out of the way, it is a matter of exploring all the settings and trying to reduce the differences more. I'm guessing there would always be differences, but it would be nice to minimize them.

@rmuir
Copy link
Member

rmuir commented Mar 26, 2025

For the record those diffstats were based on ./gradlew -p lucene/suggest spotlessApply and include the changes of the patch/formatter XML itself

@rmuir
Copy link
Member

rmuir commented Mar 26, 2025

@dweiss I also wonder, with an "autoformat" workflow, if we even care so much.

I don't understand what is so sacrosanct about google's format: to me it is ugly. Snippet tag is from java 18 (6 releases back) and google doesn't care, they are a big corporation and probably the type to keep code on e.g. java 8. I don't think we should weigh their opinions very much on anything.

All autoformatters lead to ugliness at times, it is just the tradeoff you make to avoid hassles, and still reap the benefits of avoiding formatting bikesheds, noise in PRs, etc.

I just think autoformat the code in a consistent way, call it a day.

@dweiss
Copy link
Contributor

dweiss commented Mar 26, 2025

I just think autoformat the code in a consistent way, call it a day.

I agree, it does not matter which one you pick if it's an automated process.

I don't understand what is so sacrosanct about google's format: to me it is ugly.

My initial choice of gjf was motivated purely on subjective experience - I just liked the way it formatted code, that's it (I still do). Perhaps one more contributing factor was that I didn't want to play with a gazillion of Eclipse's options... this also facilitates the discussions concerning "which settings are best"... it's what it is, end of discussion. :)

But I don't take it personally, really. I myself filed one or two issues with them and there is definitely corporate inertia in changing things. So if switching to Eclipse lets us make advancements (like with markdown javadocs), let's just do it. I'm fine with it.

@dweiss
Copy link
Contributor

dweiss commented Mar 26, 2025

We'd probably have to apply reformatting to 10x and main to keep cherry picking easier. Other than that - it's a simple thing to do.

@rmuir
Copy link
Member

rmuir commented Mar 26, 2025

I will play with the "don't reformat javadoc option". Maybe it's an easier solution to these problems? If we can coerce Google formatter to treat /// as javadoc then problem solved. Markdown is sensitive to indentation etc so it needs to not mess with it.

@dweiss
Copy link
Contributor

dweiss commented Mar 26, 2025

I've toyed with it a bit but I don't see a way for it to not break those /// comments. An alternative is to fork it, fix what we need and then use the forked version from spotless. This is a doable alternative to using Eclipse's formatter - I really don't mind either.

@dweiss
Copy link
Contributor

dweiss commented Mar 26, 2025

https://github.com/google/google-java-format/blob/master/core/src/main/java/com/google/googlejavaformat/java/JavaCommentsHelper.java#L46-L60

All it takes would be to preserve any formatting in markdown comments (the /// lines) - a dumb check above would do the trick, I think.

@rmuir
Copy link
Member

rmuir commented Mar 26, 2025

@dweiss I'm wondering if we could send them a PR such that any /// line comment respects the --skip-javadoc-formatting flag (or some other flag to say "dont mess around"). it would at least allow projects to move forward with markdown javadoc comments, while still using google-java-format.

@dweiss
Copy link
Contributor

dweiss commented Mar 26, 2025

Yeah. I'll take a look at that, interesting. Part of the problem is that different Java versions seem to be returning a different tokenization of those comment strings. Seems like something has changed even from this issue that I filed.

google/google-java-format#1153

because when I run it now on this input:

/// In the following indented code block, `@Override` is an annotation, and not a JavaDoc tag. You must not wrap this line. Even though it's long.
///
///     @Override
///     public void m() ...
///
/// Likewise, in the following fenced code block, `@Override` is an annotation,
/// and not a JavaDoc tag:
///
/// ```
/// @Override
/// public void m() ...
/// ```
class Example {
    public static void main(String... args) {
    	/// Foo, bar baz.
        System.out.println("Hello World!");
    }
}

it breaks the first long line incorrectly:

/// In the following indented code block, `@Override` is an annotation, and not a JavaDoc tag. You
// must not wrap this line. Even though it's long.

I'll take a look in the evening. I don't think it should touch those /// lines at all.

@rmuir
Copy link
Member

rmuir commented Mar 26, 2025

@dweiss I think that is because google-java-format uses internal JDK compiler apis to parse it. just like error prone. it is why you have to add all the opens?

@dweiss
Copy link
Contributor

dweiss commented Mar 26, 2025

Yes, that's correct - google/google-java-format#1153 (comment)

@dweiss
Copy link
Contributor

dweiss commented Mar 26, 2025

Here is what I did.

  • added a brute-force non-formatting to any /// line comments in my fork of google-java-format [1]
  • added a local, precompiled binary of the above to my fork of Lucene's repository and modified spotless to use it. [2][3]

It seems to work (I've modified one of the classes, intentionally leaving a super-long line there). This is just a PoC that it's doable... I'm not sure if this patch would be accepted to google-java-format as is (it makes an assumption that anything starting with /// should just be left alone). I also don't think we can store a binary blob of google java format in Lucene repository.

I can try to initiate a discussion with google-java-format folks first. If this doesn't work, I can publish my fork under my own coordinates to Maven Central - then we won't need the local binary blob and things should just work.

[1] google/google-java-format@master...dweiss:google-java-format:preserve-markdown-like-lines
[2] https://github.com/apache/lucene/compare/main...dweiss:lucene:gjf-markdown-friendly?expand=1
[3] https://github.com/dweiss/lucene/tree/gjf-markdown-friendly

@rmuir
Copy link
Member

rmuir commented Mar 26, 2025

@dweiss very nice. the /// can have leading whitespace in front of it which is preserved too. I dont know how their parser works but you can simulate the leading-case by adding a method, e.g. by adding comment to one of the members of your FacetFieldCollector in your branch

Its this case:

/// long comment...
public class foo {
  /// another long comment...
  public void bar(){}
}

@dweiss
Copy link
Contributor

dweiss commented Mar 27, 2025

It'll work with those indented lines as well - it actually will align block indentation to the column they should be starting at. So:

    /// [Collector] for cutter+recorder pair. NOCOMMIT: This is a long line comment that should not be broken into smaller chunks.
    ///   * also, the space after /// should be preserved
    ///   * like here.
  public FacetFieldCollector(FacetCutter facetCutter, FacetRecorder facetRecorder) {
    this.facetCutter = facetCutter;
    this.facetRecorder = facetRecorder;
  }

ends up this after running tidy:

  /// [Collector] for cutter+recorder pair. NOCOMMIT: This is a long line comment that should not be broken into smaller chunks.
  ///   * also, the space after /// should be preserved
  ///   * like here.
  public FacetFieldCollector(FacetCutter facetCutter, FacetRecorder facetRecorder) {
    this.facetCutter = facetCutter;
    this.facetRecorder = facetRecorder;
  }

@dweiss
Copy link
Contributor

dweiss commented Mar 27, 2025

@rmuir
Copy link
Member

rmuir commented Mar 27, 2025

awesome! I really hope the patch is accepted: this will definitely give us a path forward.

@dweiss
Copy link
Contributor

dweiss commented Mar 27, 2025

Let's wait a few days and see if there's any feedback. Like I mentioned above, what we use for formatting/checking format adherence shouldn't really matter for anybody who uses gradlew tidy in Lucene - gcf is an ASL licensed software and if we find anything problematic, I'll just publish my own fork and we can switch to that (and then we have the freedom of doing whatever, including applying markdown formatting of our own... like https://github.com/commonmark/commonmark-java).

@rmuir
Copy link
Member

rmuir commented Mar 27, 2025

yeah: only thing I will say about choice of formatter is that normally I try to configure the editor in the repo to match what the build expects. e.g. for languages like python and typescript, it means disabling all formatting-related diagnostics and enabling "format-on-save" and "organize-imports-on-save", so that things "just work". It makes for easy dev experience where users have less friction with CI.

But with java, such a setup seems difficult to achieve: I'm not sure anyone even has that today.

@dweiss
Copy link
Contributor

dweiss commented Mar 27, 2025

You're right - this may be a problem. I use intellij and even gjf there does not work exactly the same way (google/google-java-format#1165). I stopped caring after a while. Just do whatever I need, then run a tidy before committing. I know - not ideal. But at least a workable solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants