Description
Is there an existing issue for this?
- I have searched for existing issues and did not find anything like this
Describe the bug
Based on JavaScript encoding of JSON, there appear to be multiple problems with our tool.
First of all, it appears that the \
being converted to a \\
is wrong. I will give examples in the what we should expect section.
The next problem is that code points should be converted to unicode symbols just like with jstrdecode(1)
. This is how it is with JavaScript too: both encode and decode need to do this.
Another problem is that the \
escape chars should not be done the way we have it. See what to expect for examples.
What you expect
In order of the problems above, here are examples of what jstrencode(1) does and what JavaScript does.
If we have the JavaScript:
const json_to_encode = {
name: "\u0f0ff\bo"
};
it SHOULD (i.e. this is what JavaScript does) encode to:
{"name":"༏f\bo"}
but our tool converts the string to:
$ jstrencode '"\u0f0ff\bo"'
\"\\u0f0ff\\bo\"
Notice the double \
before the b
! Now on the subject of the escaped quotes in the beginning see the anything else section.
Now as far as the code points go, javascript of:
const json_to_encode = {
name: "\u0f0f"
};
converts to:
{"name":"༏"}
but our tool does this:
$ jstrencode '{"name": "\u0f0f"}'
{\"name\": \"\\u0f0f\"}
or as just a string:
$ jstrencode '"\u0f0f"'
\"\\u0f0f\"
Now if the \uxxxx
was converted to a unicode symbol it might just be that the \"
surrounding the output would be the difference but I am not sure of this.
The third problem appears to be even worse. There might be other cases where something like this happens but anyway the JavaScript:
const json_to_encode = {
name: "\u0f0ff\co"
};
... turns into:
{"name":"༏fco"}
Notice how the \c
has the \
silently removed and the c
is by itself (or rather it's after the unicode symbol and before the o
).
But our tool does something extremely wrong. First as a string by itself:
$ jstrencode '"\u0f0ff\co"'
0;xexyl@xexyz:~$
.. or as the json with {}
:
$ jstrencode {"name":"\u0f0ff\co"}
0;xexyl@xexyz:~$
As can be seen the encoding concept we have appears to be totally wrong.
Environment
- OS: linux for tool tests, macOS for JavaScript but really should be n/a
- Device: n/a
- Compiler: n/a
jparse_bug_report.sh output
n/a
Anything else?
As for the escaped "
surrounding the string. I guess it depends on how this tool will be used but even so it would appear to be the wrong default.
As for the doubling of \
it also depends on how we need to use it.
But clearly the \
of non-valid escape chars seems to be wrong.
Of course it could be that it's because I am tired or because I do not really know JavaScript but it might not be. I think it probably isn't either of those. But of course depending on what our tool needs to do that deviates from a normal json encoder ....