Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Issues that aren’t really major issues but are still issues #14

Open
3 tasks done
lcn2 opened this issue Sep 11, 2024 · 239 comments
Open
3 tasks done

Question: Issues that aren’t really major issues but are still issues #14

lcn2 opened this issue Sep 11, 2024 · 239 comments
Assignees
Labels
42 The answer! :-)

Comments

@lcn2
Copy link
Contributor

lcn2 commented Sep 11, 2024

Because some issues are not a real major issue, which is an issue.

  • <<=== click this box
  • <<=== or click this
  • <<=== not an issue
@lcn2
Copy link
Contributor Author

lcn2 commented Sep 11, 2024

We worked in a family member health issue for the last 48 hours.

We hope to address questions posed in issue #13 tomorrow, assuming you need us to answer them by tomorrow (else update them accordingly).

@xexyl
Copy link
Owner

xexyl commented Sep 11, 2024

Because some issues are not a real major issue, which is an issue.

  • <<=== click this box
  • <<=== or click this
  • <<=== not an issue

Thanks for the laugh!

@xexyl
Copy link
Owner

xexyl commented Sep 11, 2024

We worked in a family member health issue for the last 48 hours.

We hope to address questions posed in issue #13 tomorrow, assuming you need us to answer them by tomorrow (else update them accordingly).

Best wishes!

I have a kind of busy day today but I am hoping to look at the issue more soon.

I have one thought that might help but even so some of the comments if nothing else need replies (for discussion).

Thanks and no worries on my behalf! Health comes first always.

@xexyl
Copy link
Owner

xexyl commented Sep 11, 2024

I have a great idea for the Makefiles: a new control variable for the test suites.

I did it already in jparse but haven't committed yet: I have to take care of other things first.

Anyway I will also do it here and the dbg and dyn_array repos: it's VERBOSITY and default is 0 but it allows one to do something like:

make VERBOSITY=1 test

so one can more easily run the tests (and bug report for that matter if I added such an option which might not be the case as we always want it very verbose) with the Makefile and so not having to figure out the right command line to do so.

I have changes in jparse that can't be committed yet but when back at the laptop I will commit the things that can and then sync over.

Usually one will not want to change the value of the variable but during testing it might be good. What's more is that many times verbosity would have been useful for the test workflow so we can now do that too.

I have to go through photos from the past couple days, today, and I am planning on making courgette (or if you prefer zucchini) bread but otherwise until sometime early afternoon I hope to work on the repos.

@xexyl
Copy link
Owner

xexyl commented Sep 12, 2024

FYI: it is quite likely that tomorrow I won't be able to spend as much time working on these things. I've not been able to get some other things done that need to be taken care of. They could wait but I don't want to get too behind.

It might be I still get some done but I'm not sure how much. We shall see. Anyway I'm done for the day here and at the other repos too.

Hope the medical situations are going okay or will be soon!

@xexyl
Copy link
Owner

xexyl commented Sep 14, 2024

My mum is in hospital right now .. just emergency at this point. Hypertension and vertigo. She was coherent. They've taken a blood draw (no results yet) and they have her on IV for hydration.

But this has taken extra time away. I did make a few fixes in jstrdecode (minor) and I was going to sync to mkiocccentry repo along with other repos that might not be synced but the above happened and so I'm unable to do that now. I have to go take care of other things. If I don't get to this today I'll do it tomorrow. Left some comments in the thread about decoding bugs but that's possibly all I will do today. Sorry for the delays!

@lcn2
Copy link
Contributor Author

lcn2 commented Sep 14, 2024

My mum is in hospital right now .. just emergency at this point. Hypertension and vertigo. She was coherent. They've taken a blood draw (no results yet) and they have her on IV for hydration.

But this has taken extra time away. I did make a few fixes in jstrdecode (minor) and I was going to sync to mkiocccentry repo along with other repos that might not be synced but the above happened and so I'm unable to do that now. I have to go take care of other things. If I don't get to this today I'll do it tomorrow. Left some comments in the thread about decoding bugs but that's possibly all I will do today. Sorry for the delays!

Sad "turn of events", as the expression goes.

Please take care of yourself and help her as needed.

Please present my "best compliments", as the 18th century expression went, to your mom.

@xexyl
Copy link
Owner

xexyl commented Sep 14, 2024

My mum is in hospital right now .. just emergency at this point. Hypertension and vertigo. She was coherent. They've taken a blood draw (no results yet) and they have her on IV for hydration.

But this has taken extra time away. I did make a few fixes in jstrdecode (minor) and I was going to sync to mkiocccentry repo along with other repos that might not be synced but the above happened and so I'm unable to do that now. I have to go take care of other things. If I don't get to this today I'll do it tomorrow. Left some comments in the thread about decoding bugs but that's possibly all I will do today. Sorry for the delays!

Sad "turn of events", as the expression goes.

Please take care of yourself and help her as needed.

Please present my "best compliments", as the 18th century expression went, to your mom.

She says thank you!

She's doing a lot better.

Sometimes she has a problem with dehydration. This has happened before but it didn't dawn on me until she was already there.

And that's what they determined too.

@xexyl
Copy link
Owner

xexyl commented Sep 15, 2024

QUESTION - is this correct behaviour of jstrencode ?

Is this correct?

$ jstrencode -Qe '\"foo"'
\\\"foo

@lcn2
Copy link
Contributor Author

lcn2 commented Sep 15, 2024

QUESTION - is this correct behaviour of jstrencode ?

Is this correct?

$ jstrencode -Qe '\"foo"'
\\\"foo

No, it is not correct. The trailing double quote should NOT have been removed!

@xexyl
Copy link
Owner

xexyl commented Sep 15, 2024

QUESTION - is this correct behaviour of jstrencode ?

Is this correct?

$ jstrencode -Qe '"foo"'

\"foo

No, it is not correct. The trailing double quote should NOT have been removed!

So I thought.

I can look at it tomorrow unless you want to take care of it.

@lcn2
Copy link
Contributor Author

lcn2 commented Sep 16, 2024

QUESTION - is this correct behaviour of jstrencode ?

Is this correct?

$ jstrencode -Qe '"foo"'

"foo

No, it is not correct. The trailing double quote should NOT have been removed!

So I thought.

I can look at it tomorrow unless you want to take care of it.

This issue has been fixed in PR #18

The form that PR has been applied to the mkiocccentry repo via commit 4d60badafca319f126e31b72165c473348b51055

@xexyl
Copy link
Owner

xexyl commented Sep 16, 2024

QUESTION - is this correct behaviour of jstrencode ?

Is this correct?

$ jstrencode -Qe '"foo"'

"foo

No, it is not correct. The trailing double quote should NOT have been removed!

So I thought.

I can look at it tomorrow unless you want to take care of it.

This issue has been fixed in PR #18

The form that PR has been applied to the mkiocccentry repo via commit 4d60badafca319f126e31b72165c473348b51055

Thank you! Well done!

@xexyl
Copy link
Owner

xexyl commented Sep 18, 2024

head-injured-hadi-law

@xexyl
Copy link
Owner

xexyl commented Sep 19, 2024

head-injured-hadi-law

I knew you would be amused as I know you can also read upside down and sideways and diagonally and mirrored and all the combinations.

I was looking for a picture of a hoarding (or if you prefer billboard) that could be easily doctored for another meme I wanted to make (based on a joke that popped into my head last night or so) and then I saw this.

It does remind me of a computer shop that used to be here. The owner deliberately put the sign upside down to draw attention (it worked of course but then most people probably couldn't read it easily or at all). But it made me laugh.

I don't know if this one is deliberate or not but something tells me that it was a mistake.

@xexyl
Copy link
Owner

xexyl commented Sep 19, 2024

As for the decoding bug I really don't know what to do about it. I have been pondering trying to use the decoder you linked to before. It would have to be modified a bit and I agree with your idea about putting it into its own file. It is a real shame but if it works maybe it would be good to do at least for now.

But it might be good to do some research and study and experimenting with the problem to try and get it to work.

Perhaps the next step we should take is to make a test directory with minimal code with the UTF-8 to decode hard coded so we can more easily test it.

I think I will do that and hopefully tomorrow. But if you want to experiment with it please do so.

But with that being said I need sleep. Good night!

@xexyl
Copy link
Owner

xexyl commented Sep 19, 2024

Oh one more thing. I believe that you were friends with Paul Erdős which is why I am sharing this with you. I know one of the judges was and pretty sure it was you. Anyway:

https://www.muckrock.com/news/archives/2015/jul/21/nothing-indicate-nothing-indicate-subject-had-any-/

And good night now.

@lcn2
Copy link
Contributor Author

lcn2 commented Sep 19, 2024

Oh one more thing. I believe that you were friends with Paul Erdős which is why I am sharing this with you. I know one of the judges was and pretty sure it was you. Anyway:

https://www.muckrock.com/news/archives/2015/jul/21/nothing-indicate-nothing-indicate-subject-had-any-/

And good night now.

Erdös said that both "Joe" (his way referring to the Soviet Union Government - Joe as in "Joseph Stalin") and "Sam" (his was or referring to the US Government - Sam as in "Uncle Sam") were both interested in him. He was happy to talk about math and did so in the west .. but avoided the Soviet Union.

@xexyl
Copy link
Owner

xexyl commented Sep 19, 2024

Oh one more thing. I believe that you were friends with Paul Erdős which is why I am sharing this with you. I know one of the judges was and pretty sure it was you. Anyway:

https://www.muckrock.com/news/archives/2015/jul/21/nothing-indicate-nothing-indicate-subject-had-any-/

And good night now.

Erdös said that both "Joe" (his way referring to the Soviet Union Government - Joe as in "Joseph Stalin") and "Sam" (his was or referring to the US Government - Sam as in "Uncle Sam") were both interested in him. He was happy to talk about math and did so in the west .. but avoided the Soviet Union.

Okay so was it an umlaut or what I copy pasted?

As for Iosef (as I think his birth name spelling was) Stalin I won't even get started on.

The other one I won't get into either but for different reasons.

I am not in the least bit surprised he avoided the Soviet Union in discussion but it's interesting, all parts.

Thanks for the story!

I hope to start looking at the decoding issue soon but I have other things I have to do today too.

In the meantime I do have to go for now.

@lcn2
Copy link
Contributor Author

lcn2 commented Sep 19, 2024

Okay so was it an umlaut or what I copy pasted?

Erdös is an Hungarian name, and he spelled it with a ö.

We knew Erdös well, and we did some mathematics together. There is even a Erdös-Noll conjecture.

@xexyl
Copy link
Owner

xexyl commented Sep 19, 2024

Okay so was it an umlaut or what I copy pasted?

Erdös is an Hungarian name, and he spelled it with a ö.

I knew the former (though not well) but I have seen both with an umlaut and the other diacritic (which I am not sure what it even is). So it's the umlaut. Thanks.

We knew Erdös well, and we did some mathematics together. There is even a Erdös-Noll conjecture.

The last part sounds lovely! I guess you wrote about it together?

@xexyl
Copy link
Owner

xexyl commented Sep 21, 2024

Here's a great one for you .. I was about to go back to sleep. I was in the middle of a word when this happened. Fortunately I was fast enough to grab this screenshot. This is the best one I have seen yet and that's saying a lot.
IMG_1911

@xexyl
Copy link
Owner

xexyl commented Sep 21, 2024

On bats:

--

The first known use of bats was on the Cal State University networks
$TALK chat program. The author of the above poem is unknown. (And of course, the original poem did not contain web links.)

--

I read that quite a while back (maybe years ago but not sure). Anyway do you refer to talkd ? I guess that's what you mean. I used to really enjoy that one. A relic of the past as they say.

@lcn2
Copy link
Contributor Author

lcn2 commented Sep 22, 2024

On bats:

--

The first known use of bats was on the Cal State University networks
$TALK chat program. The author of the above poem is unknown. (And of course, the original poem did not contain web links.)

--

I read that quite a while back (maybe years ago but not sure). Anyway do you refer to talkd ? I guess that's what you mean. I used to really enjoy that one. A relic of the past as they say.

Yes.

@xexyl
Copy link
Owner

xexyl commented Sep 22, 2024

On bats:

The first known use of bats was on the Cal State University networks
$TALK chat program. The author of the above poem is unknown. (And of course, the original poem did not contain web links.)

I read that quite a while back (maybe years ago but not sure). Anyway do you refer to talkd ? I guess that's what you mean. I used to really enjoy that one. A relic of the past as they say.

Yes.

Thanks. Why do you do it like $TALK though? I have this vague memory but I'm not sure ..

BTW I found some typos, one of which you might want to fix and the other maybe too :-)

First: Is it a bat! Here are some examples: but i think you mean It is a bat! :-)

The second one is in the poem the word 'i' is not capitalised.

@xexyl
Copy link
Owner

xexyl commented Sep 22, 2024

BTW:

HAPPY BIRTHDAY TO YOU,
HAPPY BIRTHDAY TO YOU,
HAPPY BIRTHDAY DEAR BILBO BAGGINS,
HAPPY BIRTHDAY TO YOU!

and ..

HAPPY BIRTHDAY TO YOU,
HAPPY BIRTHDAY TO YOU,
HAPPY BIRTHDAY DEAR FRODO BAGGINS,
HAPPY BIRTHDAY TO YOU!

and ...

HAPPY HOBBIT DAY TO YOU!


I'm not sure what I'll do today but I think I might take a break and just read something .. not sure if that's LR or The Hobbit itself but I might very well do that .. along with other things of course.

@xexyl
Copy link
Owner

xexyl commented Sep 23, 2024

few-people-know-that-the-reason-apple-was-named-apple

@xexyl
Copy link
Owner

xexyl commented Sep 23, 2024

few-people-know-that-the-reason-apple-was-named-apple

The reference to an apple being in every Apple computer is part of a hilarious episode of The IT Crowd. Well they all are but this is the first one I saw and so I went back and watched the earlier ones and all the ones after it too. They didn't say it was a hoax/scam but I added that because of Jobs' behaviour and treatment of others.

@xexyl
Copy link
Owner

xexyl commented Sep 25, 2024

FYI: with commit f29677f I added a fun file jparse.json which holds some information about this repo including the primary tools. Why?

I would say 'Because!' but the real reason is some other projects have something like this AND because this is a JSON parser so it seems fun. In the process I discovered some typos in comments in the README.md file and some of the tools.

I've not synced this to the other repo and I won't today ... going to try and work on some other things that I have not (well I did somewhat but I did not go as much as I could have). Maybe. I will also be getting food sometime soon too so I probably won't get much more done with that.

Something else came up which has unfortunately messed up part of this day but that's another matter entirely.

@xexyl
Copy link
Owner

xexyl commented Nov 8, 2024

QUESTION

I wonder about something in json_README.md that I think is wrong. It might be because of the swap of terms; not sure but I believe not But anyway you had there (with decoded/encoded swapped for reasons discussed some time back):

_NOTE_: A **JSON encoded string** is _not_ allowed in a [JSON
document](#json-document).  To be [valid JSON](#valid-json), all [JSON
strings](#json-string) _must_ be [JSON decoded strings](#json-decoded-string).

(I fixed a mistake above that and it's what made me see this mistake or what I think to be a mistake.)

The thing is the example you gave encoded is certainly valid JSON. For instance:

$ cat foo6.json
"Ã"
$ jparse -v 1 foo6.json
valid JSON

Do you think that note is wrong and should be deleted? If yes I'll do that tomorrow .. off now.

@lcn2
Copy link
Contributor Author

lcn2 commented Nov 9, 2024

We have been focused too much on getting the IOCCC ready to look at this encode/decode issue much.

We do have some qualms about the current use of, in terms of JSON, encoding and decoding means.

For example, the jstrencode(1) man page reads:

jstrencode encodes JSON decoded strings given on the command line.

We disagree. The jstrencode(1) encodes strings according to the so-called JSON spec.

This is a string. It was never a "JSON encoded string". This is not a "JSON decoded string". It is just a string. It was typed in as a string that ends in a newline.

If you wanted to turn the above string into a JSON string, you would need to encode it according to the so-called JSON specification. In particular you would need to convert the internal double quotes into backslashed-double quotes. Also the final newline would need to become a backslash-n. Finally you need to enclose the entire thing in double quotes.

Thus, the JSON encoding of the above mentioned string would be:

"This is a string. It was never a \"JSON encoded string\". This is not a \"JSON decoded string\". It is just a string. It was typed in as a string that ends in a newline.\n"

The above is JSON encoded string. If that JSON encoded string were fed into jstrdecode(1) it should produce a string. The jstrdecode(1) tool decodes JSON encoded strings to produce a string.

Now it happens that if you use the jstrencode(1) tool to JSON encode a string (and thus produces a JSON encoded string), the resulting JSON encoded string can be converted back into a string by using the jstrdecode(1) tools.

Returning to the man pages, we would write for jstrencode(1):

jstrencode encodes JSON a command line string according to the so-called JSON specification. 

And for jstrdecode(1):

jstrdecode converts a JSON encoded string into a normal string.

Or something along those lines.

@xexyl
Copy link
Owner

xexyl commented Nov 9, 2024

Great ideas. I have also wondered about the wording.

And no problem about the contest. Just pondering things out loud.

I will try and address this tomorrow.

@lcn2
Copy link
Contributor Author

lcn2 commented Nov 9, 2024

Do you think that note is wrong and should be deleted? If yes I'll do that tomorrow .. off now.

You could fix it by stating:

**NOTE**: To be [valid JSON](#valid-json), all strings in a [JSON
document](#json-document) **must** be [JSON encoded strings](#json-edcoded-string).

@xexyl
Copy link
Owner

xexyl commented Nov 9, 2024

Do you think that note is wrong and should be deleted? If yes I'll do that tomorrow .. off now.

You could fix it by stating:

**NOTE**: To be [valid JSON](#valid-json), all strings in a [JSON
document](#json-document) **must** be [JSON encoded strings](#json-edcoded-string).

I will get back to you on this. I have another question but I must go again.

Thanks!

@lcn2
Copy link
Contributor Author

lcn2 commented Nov 9, 2024

I had a thought on error messages from our tools. Let's take the invalid JSON file foo5.json:

"\e"

The above string is 3 bytes long. It is an escape character enclosed in double quotes.

Excluding the enclosing double quotes, we have 1 byte.

It should be:

"\\e"

The above string is 4 bytes long. It is a blackslash character followed by a lower case e character: all enclosed in double quotes.

Excluding the enclosing double quotes, we have 2 bytes.

This length distinction is import later below.

If you had a JSON document with a JSON encoded string that contained a single escape character, and thus to be a valid JSON string that JSON encoded string would need to be enclosed in double quotes.

When debugging at the -J 5 level, we would expect the output to be something like:

JSON tree[3]:	lvl: 2	type: JTYPE_STRING	len{p,c:q}: 1	value: "\e"

Note that the length is 1, not 2.

Here the value: "\e" part refers to what the JSON string decodes into (I.e., its value) while keeping the debug output to a single line.

That's why it prints "value: ". It is showing you the value of the decoded JSON string while being enclosed in double quotes (so you know where it begins and ends in case there is leading and trailing whitespace), all on a single line (so things like newlines become \n to preserve the single debug line).

It is NOT showing you the JSON string encoding. It is NOT showing you the decoding of the JSON string.

Printing the following would be a mistake as well as not informative:

JSON tree[3]:	lvl: 2	type: JTYPE_STRING	len{p,c:q}: 1	value: "\\e"

The above "value: " is WRONG because \\e is 2 characters enclosed in double quotes, not 1.

Again, the debug output at -J 5 is showing the values of various JSON elements. It is NOT showing you JSON.

In the case of JSON string elements, it is NOT showing you JSON encoded strings.

Moreover, it is NOT showing the decoding of a JSON string directly. It cannot show you the decoding directly (I.e., what jstrdecode(1) would produce) because it needs to:

  1. Keep the debug output to a single line

  2. Surround the value output with double quotes

That is also why -J 5 prints "value: " instead of "decoded: ". It cannot display the true decoding of the JSON string and do those two things.

Thus, we disagree with your proposed debug output changes, and especially with GH-issuecomment-2465721564.

UPDATE 0a

Adding some flag that changes the default debug output to enable a "preserved Unicode symbols when printing JSON string debug values" is certainly possible if you so desire. Such a "change the debug output behavior" would not impact the default debug behavior.

@lcn2
Copy link
Contributor Author

lcn2 commented Nov 9, 2024

Do you think that note is wrong and should be deleted? If yes I'll do that tomorrow .. off now.

You could fix it by stating:

**NOTE**: To be [valid JSON](#valid-json), all strings in a [JSON
document](#json-document) **must** be [JSON encoded strings](#json-edcoded-string).

UPDATE 0

We are just reacting to the what you quoted. Perhaps our suggestion in the larger context of that document is reasonable?

@xexyl
Copy link
Owner

xexyl commented Nov 9, 2024

Hmm ... okay I see what you mean too. I actually almost wrote a reply on something along those lines. I'll reverse it but keep the bug fix in place. Then I'll open a pull request over there. But I'll not do that just yet. Need to be more awake.

Thanks for your feedback!

@xexyl
Copy link
Owner

xexyl commented Nov 9, 2024

We have been focused too much on getting the IOCCC ready to look at this encode/decode issue much.

We do have some qualms about the current use of, in terms of JSON, encoding and decoding means.

For example, the jstrencode(1) man page reads:

jstrencode encodes JSON decoded strings given on the command line.

We disagree. The jstrencode(1) encodes strings according to the so-called JSON spec.

This is a string. It was never a "JSON encoded string". This is not a "JSON decoded string". It is just a string. It was typed in as a string that ends in a newline.

If you wanted to turn the above string into a JSON string, you would need to encode it according to the so-called JSON specification. In particular you would need to convert the internal double quotes into backslashed-double quotes. Also the final newline would need to become a backslash-n. Finally you need to enclose the entire thing in double quotes.

Thus, the JSON encoding of the above mentioned string would be:

"This is a string. It was never a \"JSON encoded string\". This is not a \"JSON decoded string\". It is just a string. It was typed in as a string that ends in a newline.\n"

The above is JSON encoded string. If that JSON encoded string were fed into jstrdecode(1) it should produce a string. The jstrdecode(1) tool decodes JSON encoded strings to produce a string.

Now it happens that if you use the jstrencode(1) tool to JSON encode a string (and thus produces a JSON encoded string), the resulting JSON encoded string can be converted back into a string by using the jstrdecode(1) tools.

Returning to the man pages, we would write for jstrencode(1):

jstrencode encodes JSON a command line string according to the so-called JSON specification. 

And for jstrdecode(1):

jstrdecode converts a JSON encoded string into a normal string.

Or something along those lines.

As far as this comment. This is true. The man page was modelled after the single line summary at the top of the source file. I'll update not only the man pages but the header/source files' top comments.

@xexyl
Copy link
Owner

xexyl commented Nov 9, 2024

What's bizarre is rolling back that change and keeping in the bug fix and it does not revert back to what it was before. So I'll have to look at this later on when more awake. I thought I was awake enough but I guess not. I'll have this in order later on today and I'll fix the comments and the man pages to our discussion too.

Thank you for your valuable feedback and I hope the scientific holiday is going great! (I guess astronomy related but if something else I hope you're having just as much fun.)

@xexyl
Copy link
Owner

xexyl commented Nov 9, 2024

Ah .. I know why. It was the input file. Well anyway I'll commit a while later. Once this is done I'll make a pull request over there. Thanks.

@xexyl
Copy link
Owner

xexyl commented Nov 9, 2024

This part was just committed but not making a pull request until the wording issues have been resolved. Before that can happen I really do have to be more awake (for obvious reasons).

I do really appreciate your valuable feedback on this. I might note that I added a comment to the function that says why it does not print out unicode symbols and how it is meant to show raw data, not encoded/decoded strings, and if something like that was desired a new flag could be made.

As for whether or not I believe a new flag like that should be made. I think it's not worth it as I guess jprint(1) will be able to print this type of thing but then if debug output is requested one can see how it differs. I guess documentation on this function might be useful though: in particular the JSON debug output might want to be described a bit better. But we'll see.

@xexyl
Copy link
Owner

xexyl commented Nov 9, 2024

Will reply to your other comments in a while. Best wishes!

@xexyl
Copy link
Owner

xexyl commented Nov 9, 2024

I had a thought on error messages from our tools. Let's take the invalid JSON file foo5.json:

As far as this comment I think you misunderstood.

What I was saying is that maybe it would be good for error message functions to take an extra arg which if not NULL would be the filename from which the parser is trying to parse.

Then if there's an error in the JSON it would show what file it is in. It would show the string too of course but it might help identify the file in some cases.

I don't know if I have the energy for that today and it's more important that I fix other things and also try and get the pending changes in the website committed as well but this I believe would be a worthwhile update.

@xexyl
Copy link
Owner

xexyl commented Nov 9, 2024

I actually found no longer true statements in the man page too so it's good to update it even without the better wording. Thanks!

@xexyl
Copy link
Owner

xexyl commented Nov 9, 2024

Actually I have a concern about the change in definition to jstrencode. You wrote: jstrencode encodes a command line string according to the so-called JSON specification. But I am not sure this is quite right. Both of these are valid JSON:

"\uD83D\uDC09\uD83E\uDE84\uD83E\uDDD9"

and

"🐉🪄🧙"

and they are the same thing. That is when the first is encoded it turns into the second one. But although it does indeed encode the first into the second I am not sure it's encoding according to the JSON spec. It simply encodes the strings.

Now maybe I misunderstand you but I'm not sure if that's the case. Please advise when you have a chance. In the meantime I'll see if I can figure something out, hopefully today but possibly not.

UPDATE 0

Perhaps part of the functionality of the jstrencode(1) tool is a misnomer in some cases? Of course having encode(1) would be a big problem as it's highly likely something else is called that too but it's still not strictly JSON. Of course it could be that it does what you wrote and then also encodes other things too. I think that. might work.

@lcn2
Copy link
Contributor Author

lcn2 commented Nov 10, 2024

As for whether or not I believe a new flag like that should be made. I think it's not worth it as I guess jprint(1) will be able to print this type of thing

Yes, jprint(1) will when given the binary bits that corresponds with "\xf0\x9f\x8f\xb4\xf3\xa0\x81\xa7\xf3\xa0\x81\xa2\xf3\xa0\x81\xb3\xf3\xa0\x81\xa3\xf3\xa0\x81\xb4\xf3\xa0\x81\xbf" will need to, at a minimum print the corresponding JSON.

There is more than one way to write such JSON. For example:

"\u00f0\u009f\u008f\u00b4\u00f3\u00a0\u0081\u00a7\u00f3\u00a0\u0081\u00a2\u00f3\u00a0\u0081\u00b3\u00f3\u00a0\u0081\u00a3\u00f3\u00a0\u0081\u00b4\u00f3\u00a0\u0081\u00bf"

Or:

"🏴󠁧󠁢󠁳󠁣󠁴󠁿"

There is also a combination of emoji/Unicode symbols that results in the Scottish flag 🏴󠁧󠁢󠁳󠁣󠁴󠁿, and so that combination can also be put in a JSON encoded string.

Our point is THERE IS MORE THAN ONE WAY to write JSON. This is an IMPORTANT point. There are several JSON strings that result in the same thing.

When debugging with -J 5 one needs to see how the bits that are within a JSON encoded string are put together: NOT how they might be used when that JSON string is decoded.

Now in the case of jprint(1), it needs to format the JSON with options to pretty print stuff that would include representing that long string of \ u stuff as the emoji.

@xexyl
Copy link
Owner

xexyl commented Nov 10, 2024

Agreed with your points in reply to my comment about jprint(1). Thanks.

I also wondered if there should be a jgrep(1). But I guess we can talk about that another time.

@xexyl
Copy link
Owner

xexyl commented Nov 10, 2024

Do you think that note is wrong and should be deleted? If yes I'll do that tomorrow .. off now.

You could fix it by stating:

**NOTE**: To be [valid JSON](#valid-json), all strings in a [JSON
document](#json-document) **must** be [JSON encoded strings](#json-edcoded-string).

Minus the typo I see there (now that I'm both more awake and I have reading glasses on, not distance glasses on[0]) I still think this can't be right because as you the decoded version is the \uxxxx (or the UTF-16 version) and the encoded one is the Unicode symbol. But both are valid JSON. So I guess this line should be removed entirely. Or do you think of something I'm not thinking of?

[0] Isn't typing with phones so much fun?!

UPDATE 0

Also ... in some cases it must be in decoded form like with \u000e so it's also problematic for that reason. Maybe I'll delete the part and try and clean up the document. But for now I have to go afk.

@lcn2
Copy link
Contributor Author

lcn2 commented Nov 10, 2024

Perhaps part of the functionality of the jstrencode(1) tool is a misnomer in some cases? Of course having encode(1) would be a big problem as it's highly likely something else is called that too but it's still not strictly JSON. Of course it could be that it does what you wrote and then also encodes other things too. I think that. might work.

The purpose of the jstrencode(1) is the exercise the "convert stuff into a valid JSON encoded string according to the JSON so-called spec" functionality in the JSON parser.

It is import to note that there is MORE THAN ONE WAY to encode stuff and produce a JSON string. Perhaps a possible source of confusion and concern is due to this MORE THAN ONE WAY factor. And by MORE THAN ONE we mean many ways!

As long as jstrencode(1) produces something that is a valid JSON string according to the so-called JSON spec jstrencode(1), has done its primary job.

As long as jstrdecode(1) produces some that is EQUIVALENT to the original, given a valid JSON encoding on the original string, jstrdecode(1) has done its primary job.

Now we say EQUIVALENT and not equal because of the ways we can encode and decode.

What is ideal for testing purposes, is that when a string is given to jstrencode(1), and then when the JSON encoded string jstrencode(1) produces is then given to jstrdecode(1), we get the original string back. Again, this for testing purposes.

That this can happen for testing purposes is useful. Nevertheless there are different paths for JSON string encoding for example. That someone else might product a slightly different JSON encoded string is to be expected. This is not a bug, it is a feature of the so-callers JSON spec.

If we wanted exact and perfect translation, we would use XML. 🤓

We hope this helps.

@xexyl
Copy link
Owner

xexyl commented Nov 11, 2024

I will try replying to your comment tomorrow.

@xexyl
Copy link
Owner

xexyl commented Nov 12, 2024

Perhaps part of the functionality of the jstrencode(1) tool is a misnomer in some cases? Of course having encode(1) would be a big problem as it's highly likely something else is called that too but it's still not strictly JSON. Of course it could be that it does what you wrote and then also encodes other things too. I think that. might work.

The purpose of the jstrencode(1) is the exercise the "convert stuff into a valid JSON encoded string according to the JSON so-called spec" functionality in the JSON parser.

It is import to note that there is MORE THAN ONE WAY to encode stuff and produce a JSON string. Perhaps a possible source of confusion and concern is due to this MORE THAN ONE WAY factor. And by MORE THAN ONE we mean many ways!

As long as jstrencode(1) produces something that is a valid JSON string according to the so-called JSON spec jstrencode(1), has done its primary job.

As long as jstrdecode(1) produces some that is EQUIVALENT to the original, given a valid JSON encoding on the original string, jstrdecode(1) has done its primary job.

Now we say EQUIVALENT and not equal because of the ways we can encode and decode.

What is ideal for testing purposes, is that when a string is given to jstrencode(1), and then when the JSON encoded string jstrencode(1) produces is then given to jstrdecode(1), we get the original string back. Again, this for testing purposes.

That this can happen for testing purposes is useful. Nevertheless there are different paths for JSON string encoding for example. That someone else might product a slightly different JSON encoded string is to be expected. This is not a bug, it is a feature of the so-callers JSON spec.

If we wanted exact and perfect translation, we would use XML. 🤓

We hope this helps.

That might be so. But that's why I wonder about the text:

To be [valid JSON](#valid-json), all **JSON string**s must be [JSON decoded
strings](#json-decoded-string) in a [JSON document](#json-document).

because both decoded and encoded strings are allowed. For instance:

$ cat fire.json
{"\uD83D\uDD25":"🔥",
 "decoded":"\uD83D\uDD25",
 "encoded":"🔥" }
$ jparse -v 1 fire.json
valid JSON

Now you're right. There are different ways to encode/decode but one of the decoded forms of the fire emoji is what is above and it's perfectly valid JSON.

That's why I think that that text should be removed. Does that seem reasonable to do?

Thanks.

@lcn2
Copy link
Contributor Author

lcn2 commented Nov 12, 2024

That might be so. But that's why I wonder about the text:

To be valid JSON, all JSON strings must be JSON decoded
strings
in a JSON document.
because both decoded and encoded strings are allowed.

The "To be valid JSON, all JSON strings must be JSON decoded
strings" statement is NOT correct.

The "because both decoded and encoded strings are allowed" statement is NOT correct.

Decoded strings are NOT valid JSON.

Perhaps there is some confusion with the term encoded as it applies to JSON encoded strings.

JSON string encoding, at a minimum, requires the string to be surrounded by double quotes. At a minimum, encoding will result in the prepending and appending a double quote character.

Consider this string (note it has not have a trailing newline):

foo

The string above is NOT valid JSON. It is not a JSON encoded string.

To JSON encode the above, we need to do this (also does not have a trailing newline):

"foo"

The string above is valid a JSON encoded string.

When we decode the above JSON encoded string, we get our original string back:

foo

Lets look at a more complex string:

This "string" has a newline
in the middle and at the end

The above string is NOT valid JSON. It is not properly JSON encoded string.

When we JSON encode the above string, we get (note it has not have a trailing newline):

"This \"string\" has a newline\nin the middle and at the end\n"

Returning to our first string without a trailing newline: foo.

Recall we encoded that string as "foo".

There are more ways to JSON encode that string. Here is one of them: "\u0066oo"

Here is another: "f\006fo"

And another: "\u0066\006fo"

Etc.

We hope this helps.

UPDATE 0

This string is not a valid JSON string

🔥

because it is not properly JSON encoded.

This is a valid JSON encoded string:

"🔥"

That same string may be JSON encoded as:

"\uD83D\uDD25"

Both of those are valid JSON encodings of the string:

🔥

UPDATE 1

To be valid JSON, all strings in a JSON
document
must be JSON encoded strings.

@xexyl
Copy link
Owner

xexyl commented Nov 12, 2024

That might be so. But that's why I wonder about the text:
To be valid JSON, all JSON strings must be JSON decoded
strings
in a JSON document.
because both decoded and encoded strings are allowed.

The "To be .." statement is NOT correct.

The "because both decoded and encoded strings are allowed" is NOT correct.

Decoded strings are NOT valid JSON.

Perhaps there is some confusion with the term encoded as it applies to JSON encoded strings.

JSON string encoding, at a minimum, requires the string to be surrounded by double quotes. At a minimum, encoding will invoke prepending and appending a double quote character.

Consider this string (note it has not have a trailing newline):

foo

The string above is NOT valid JSON. It is not a JSON encoded string.

To JSON encode the above, we need to do this (also does not have a trailing newline):

"foo"

The string above is valid a JSON encoded string.

When we decode the above JSON encoded string, we get our original string back:

foo

Lets look at a more complex string:

This "string" has a newline
in the middle and at the end

The above string is NOT valid JSON. It is not properly JSON encoded string.

When we JSON encode the above string, we get (note it has not have a trailing newline):

"This \"string\" has a newline\nin the middle and at the end\n"

Returning to our first string without a trailing newline: foo.

Recall we encoded that string as "foo".

There are more ways to JSON encode that string. Here is one of them: "\u0066oo"

Here is another: "f\006fo"

And another: "\u0066\006fo"

Etc.

We hope this helps.

Yes. That's what I was saying. So I will remove that statement. Thank you. I'll do a pull request over there but probably not today. Have too many other things on my mind right now.

@xexyl
Copy link
Owner

xexyl commented Nov 12, 2024

Hmm .. actually I think more of that section is problematic. What do you think it should say? Can't easily copy paste the whole thing but I think this might work? Clearly the references to decoded in the way they're used is incorrect. But maybe this would be okay?

## JSON string

A **JSON string** is zero or more
[JSON decoded](#json-decoded) 
characters
enclosed in double quotes (`"`). A
**JSON string** may be [JSON decoded](#json-decoded) or [JSON
encoded](#json-encoded-string).

As noted above, a **JSON string** may be empty (zero bytes long). 
As represented in JSON, such an empty (zero bytes long)
**JSON string** would be `""`.


Note that there are multiple ways to represent the same thing. 
For instance the string `"\uD83D\uDD25"`, when encoded, is `"🔥"`.

Note that to simplify inputting this I changed the text to not use code blocks with syntax highlighting but that can be kept in when changing it.

@lcn2
Copy link
Contributor Author

lcn2 commented Nov 12, 2024

It seems far worse than we had originally thought. See issue 2752 in the other repo and comment there if you prefer.

@xexyl
Copy link
Owner

xexyl commented Nov 14, 2024

It's even worse than what you thought. First of all sorry for the swap in names. That happened because I lost focus of the fact it's a JSON encoder, when focusing on the \uxxxx encoding/decoding bug.

I am glad you discovered the problem, however, because I have uncovered a few bugs and incongruences with JavaScript encoding. As it is I have to do more testing with decoding (in the same way). One of the bugs is quite bad.

I must leave now but I will hopefully be able to update the website tools to use jstrdecode(1) which fortunately seems to be working correctly or at least correctly for the website!

I know you're on limited Internet access but when you're back home I hope you can help with the problem as it seems quite bad. Of course as I noted over in the new issue (#28) depending on what we need some of these bugs might be something that can be added as a feature though in that case the encoding functions will become far more complicated, it would seem.

Thanks again! I would apologise for losing focus of the encoding aspect but it turned out this is a good thing that I did and that you discovered the problem with the website as I have now unearthed a number of problems with jstrencode!

@xexyl
Copy link
Owner

xexyl commented Nov 14, 2024

Oh great. The syncing of jparse to mkiocccentry causes compile errors. I wonder why .. unless it's some other change. Seems like it might be missing an #include or something else bad happened. Ah .. I think I have an idea why, due to renaming of functions.

UPDATE 0

Yes that was it. I had the same issue when I swapped the tool/function names.

@xexyl
Copy link
Owner

xexyl commented Nov 14, 2024

Ugh. I accidentally committed (in rush) the other changes. So that's why make test failed.

So I will have to roll back some changes. Later on today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
42 The answer! :-)
Projects
None yet
Development

No branches or pull requests

3 participants