-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Java: Add query to detect special characters in string literals #19875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -0,0 +1 @@ | |||
Violations of Best Practice/SpecialCharactersInLiterals/NonExplicitControlAndWhitespaceCharsInLiterals.ql |
Check warning
Code scanning / CodeQL
Query test without inline test expectations Warning test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider an inline expectations test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I quickly turned it into an inline expectations test, but it failed, I think due to the fact that the strings contain multiple special characters. So then I reverted it, and kept it as it is.
e1b966e
to
9db9200
Compare
eeaaa87
to
834cb1b
Compare
* correctness | ||
* maintainability | ||
* readability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking: you believe this matches both the correctness
and readability
categories?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think readability
is definitely okay. correctness
might/could be challenged. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it can belong to both. But between maintainability
and reliability
, you think it's better to choose maintainability
, because readability
is definitely an issue whereas correctness
is only occasionally an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't given it this much thought. I simply thought maintainability goes together with readability. I think we could remove the correctness
tag, and then it fully matches the guidelines, of having one top level tag and one subcategory from that tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, I think it's fine as it is actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I just read the description of this PR and it says that we shouldn't use subcategories from different top-level categories. I will discuss the issue on that PR. Maybe wait until that discussion is complete to see if we should remove correctness
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to remove correctness
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already removed it. I think it's okay if we keep the tags to a minimum for the time being.
@@ -0,0 +1 @@ | |||
Violations of Best Practice/SpecialCharactersInLiterals/NonExplicitControlAndWhitespaceCharsInLiterals.ql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider an inline expectations test?
- [Unicode Control Characters](https://www.unicode.org/charts/PDF/U0000.pdf). | ||
- [Unicode C0 control codes](https://en.wikipedia.org/wiki/C0_and_C1_control_codes). | ||
- [Unicode characters with property "WSpace=yes" or "White_Space=yes"](https://en.wikipedia.org/wiki/Unicode_character_property#Whitespace). | ||
- [Java String Literals](https://docs.oracle.com/javase/tutorial/java/data/characters.html). | ||
- [Java Class Charset](https://docs.oracle.com/javase/8/docs/api///?java/nio/charset/Charset.html). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit pedantic, but these don't follow the style guide for references. Examples:
- Java API Specification: [Pattern.MULTILINE](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html#MULTILINE)
- Wikipedia: [Template method pattern](https://en.wikipedia.org/wiki/Template_method_pattern).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good—only trivial comments 👍
... Best Practice/SpecialCharactersInLiterals/NonExplicitControlAndWhitespaceCharsInLiterals.md
Outdated
Show resolved
Hide resolved
... Best Practice/SpecialCharactersInLiterals/NonExplicitControlAndWhitespaceCharsInLiterals.md
Outdated
Show resolved
Hide resolved
... Best Practice/SpecialCharactersInLiterals/NonExplicitControlAndWhitespaceCharsInLiterals.md
Outdated
Show resolved
Hide resolved
... Best Practice/SpecialCharactersInLiterals/NonExplicitControlAndWhitespaceCharsInLiterals.md
Outdated
Show resolved
Hide resolved
... Best Practice/SpecialCharactersInLiterals/NonExplicitControlAndWhitespaceCharsInLiterals.md
Outdated
Show resolved
Hide resolved
... Best Practice/SpecialCharactersInLiterals/NonExplicitControlAndWhitespaceCharsInLiterals.md
Outdated
Show resolved
Hide resolved
... Best Practice/SpecialCharactersInLiterals/NonExplicitControlAndWhitespaceCharsInLiterals.md
Outdated
Show resolved
Hide resolved
... Best Practice/SpecialCharactersInLiterals/NonExplicitControlAndWhitespaceCharsInLiterals.md
Outdated
Show resolved
Hide resolved
... Best Practice/SpecialCharactersInLiterals/NonExplicitControlAndWhitespaceCharsInLiterals.md
Outdated
Show resolved
Hide resolved
* @tags quality | ||
* maintainability | ||
* readability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michaelnebel and I concluded that you can keep correctness
, btw.
4fcbdc6
to
14a91dc
Compare
I just noticed that this is |
I think we can increase the precision. There are 139 alerts on the top 1000 MRVA repos. |
14a91dc
to
ddfd774
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad this will reach more people. It isn't about the number of alerts, but the probability that an alert will be a TP/FP. But presumably since you went for very-high
you feel pretty good about the FP rate from the alerts that you looked at.
ddfd774
to
ccbf705
Compare
This pull request introduces a new query to detect non-explicit control and whitespace characters in Java literals, improving code readability and reducing potential bugs caused by invisible or hard-to-recognize characters.
Manually testing autofix works, the special characters are replaced with
\u0...
.