-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: the concept of jstrencode(1)
appears to be both very wrong and very buggy
#28
Comments
One thing that should be done and which I will do is to rename the |
Renamed utf8encode() to utf8_to_unicode() to be less confusing as although converting a code point to unicode is called encoding (from all sources we have seen) in JSON, according to encoders **AND** decoders out there, a code point in a string should be converted to unicode. A number of bugs have been identified in jstrencode(1) during discussion in 'the other repo' (or one of the 'other repos'). This is in jstrencode(1) now (as of yesterday); prior to yesterday it was in jstrdecode(1) due to the unfortunate swap in names. This swap happened because when focusing on issue #13 (the decoding - which turned out to be encoding - bug of \uxxxx) focus of the fact that the jstrencode(1) tool is not strictly UTF-8 but rather JSON was lost. The man page has had these bugs added so it is important to remove them when the bugs are fixed. A new issue #28 has been opened for these problems.
Something I do not understand, looking at the so-called json spec and the JavaScript output, is that for the non valid I presume that this is not understanding the standard correctly (if we can call something about it 'correct' :-) ) or because I am tired and not reading the grammar right. I rather suspect it's a bit of both. |
Is there an existing issue for this?
Describe the bug
Based on JavaScript encoding of JSON, there appear to be multiple problems with our tool.
First of all, it appears that the
\
being converted to a\\
is wrong. I will give examples in the what we should expect section.The next problem is that code points should be converted to unicode symbols just like with
jstrdecode(1)
. This is how it is with JavaScript too: both encode and decode need to do this.Another problem is that the
\
escape chars should not be done the way we have it. See what to expect for examples.What you expect
In order of the problems above, here are examples of what jstrencode(1) does and what JavaScript does.
If we have the JavaScript:
it SHOULD (i.e. this is what JavaScript does) encode to:
but our tool converts the string to:
Notice the double
\
before theb
! Now on the subject of the escaped quotes in the beginning see the anything else section.Now as far as the code points go, javascript of:
converts to:
but our tool does this:
or as just a string:
Now if the
\uxxxx
was converted to a unicode symbol it might just be that the\"
surrounding the output would be the difference but I am not sure of this.The third problem appears to be even worse. There might be other cases where something like this happens but anyway the JavaScript:
... turns into:
Notice how the
\c
has the\
silently removed and thec
is by itself (or rather it's after the unicode symbol and before theo
).But our tool does something extremely wrong. First as a string by itself:
.. or as the json with
{}
:As can be seen the encoding concept we have appears to be totally wrong.
Environment
jparse_bug_report.sh output
n/a
Anything else?
As for the escaped
"
surrounding the string. I guess it depends on how this tool will be used but even so it would appear to be the wrong default.As for the doubling of
\
it also depends on how we need to use it.But clearly the
\
of non-valid escape chars seems to be wrong.Of course it could be that it's because I am tired or because I do not really know JavaScript but it might not be. I think it probably isn't either of those. But of course depending on what our tool needs to do that deviates from a normal json encoder ....
The text was updated successfully, but these errors were encountered: