-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PrettyPrintWriter
fails to serialize characters in the Unicode Supplementary Multilingual Plane in XML 1.0 mode and XML 1.1 mode
#337
Conversation
…mentary Multilingual Plane in XML 1.0 mode and XML 1.1 mode
final int length = text.length(); | ||
for (int i = 0; i < length; i++) { | ||
final char c = text.charAt(i); | ||
text.codePoints().forEach(c -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess 1fcfa0b makes this (@since 9
) safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea what you are talking about in this review comment. The method is present in Java 8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps. Was just going by https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/String.html#codePoints() which says 9. At any rate I would hope the CI build would fail if this were not permitted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps
Do you have any evidence for this claim which is casting doubt on the correctness of this change and potentially making it harder for subsequent reviewers to approve? If you do not, I would suggest that you refrain from making such review comments.
https://docs.oracle.com/javase/8/docs/api/java/lang/CharSequence.html#codePoints--
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above link. It seems the @since
tags are contradictory, unless the JDK team has a policy of noting when an override of a default method was added (which would seem strange to me since that should not change the API surface).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.oracle.com/javase/8/docs/api/java/lang/CharSequence.html#codePoints-- is present in Java 8 and this code compiles successfully on Java 8. As far as I can tell there is no action item here, and this whole review comment was unnecessary and served only to chew up some of my time to refute an unverified claim as well as potentially confusing future reviewers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XStream 1.5.x will target Java 11. No point any longer to use Java 8 as minimum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But XStream 1.4 still uses Java 8, and we want this critical bug fix in that line. Anyway, this change works in Java 8, so this whole thread is pointless. I have no idea why this review feedback was left in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codePoints()
was added to CharSequence
interface as a default method in Java 8.
In Java 9, an override of this method was added to String
(which implements CharSequence
).
So, it should work for both Java 8 and 9, but it can be slightly faster for Strings in Java 9+ due to optimised version added to String
in Java 9.
@@ -238,7 +236,7 @@ private void writeText(final String text, final boolean isAttribute) { | |||
case '\t': | |||
case '\n': | |||
if (!isAttribute) { | |||
writer.write(c); | |||
writer.write(Character.toChars(c)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Unnecessary in this case I think.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary in this case I think.
How would it compile without this hunk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I just meant in this case we know the character will be a single char
. Not important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, and I knew that when deciding to use Character.toChars(c)
in this case and the case below rather than prematurely optimizing by casting the int
to a char
.
This review comment was unnecessary in this case I think.
@@ -251,7 +249,7 @@ private void writeText(final String text, final boolean isAttribute) { | |||
+ " in XML stream"); | |||
} | |||
} | |||
writer.write(c); | |||
writer.write(Character.toChars(c)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this could be slightly less efficient since it allocates a char[]
. It does not seem that the method overall is optimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an action item here? If not, then what is the purpose of this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not an action item, solely to note for any other reviewers that this change could affect performance, if that is even a consideration.
Why is this not assigned to the 1.4 milestone? This is a critical bug fix that we want in 1.4. |
Because 1.5.x is dropping compatibility to Java 10 to 1.4. |
I think it would make more sense for the 1.4.x line to require Java 8 or newer or to backport this fix to the 1.4.x line with a for-loop based implementation that can run on Java 7 or earlier. |
Any plans to merge this PR and release version 1.5.x requiring Java 11 or newer? |
Sorry for the long delay... |
PrettyPrintWriter
fails to properly serialize characters in the Unicode Supplementary Multilingual Plane (SMP) in XML 1.0 mode and XML 1.1 mode (quirks mode works) with the following exception:The root cause of the problem is incorrect iteration over Unicode code points. The current implementation iterates over the UTF-16 representation of the characters rather than iterating over each code point. Characters in the Supplementary Multilingual Plane are encoded in UTF-16 as two digits. For example U+1F98A is encoded in UTF-16 as 0xD83E 0xDD8A. Java provides a dedicated API to iterate over code points, but XStream makes the erroneous assumption that a code point and a character are equivalent, likely because it was never tested outside of quirks mode with characters in the Supplementary Multilingual Plane. This PR fixes the problem by using the Java API for iterating over code points, thus removing the faulty assumption that a code point and a character are equivalent.
The new quirks mode test passes before and after the changes to
PrettyPrintWriter
. The new XML 1.0 mode and XML 1.1 mode tests fail before the changes toPrettyPrintWriter
with the exception given above. The new XML 1.0 mode and XML 1.0 mode tests pass after the changes toPrettyPrintWriter
.Fixes #336