Skip to content

Unicode / Encoding issues #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
larswise opened this issue Feb 18, 2019 · 5 comments
Open

Unicode / Encoding issues #19

larswise opened this issue Feb 18, 2019 · 5 comments

Comments

@larswise
Copy link

larswise commented Feb 18, 2019

I'm facing some encoding issues with the client;
The problem are when using non ascii characters, more precisely æøåÆØÅ etc.

client.jsonset("test", Path.rootPath(), {'name': 'test111', 'items': []})

client.jsonget('test') --> {'name': 'test111', 'items': []}

client.jsonarrinsert('test', Path('.items'), 0, {'company': 'Åre', 'destination': 'ÅS', 'origin': 'LØR'})

client.jsonget('test')

This does not look correct?
{"name":"test111","items":[{"company":"\u00c3\u0085re","destination":"\u00c3\u0085S","origin":"L\u00c3\u0098R"},]}

What i had expected:
{"name":"test111","items":[{"company":"\u00c5re","destination":"\u00c5S","origin":"L\u00d8R"},]}
or
{"name":"test111","items":[{"company":"\xc5re","destination":"\xc5S","origin":"L\xd8R"},]}

If i save as strings they appear to get the correct encoding, but then my array elements are turned in to strings instead of objects

If I'm doing it wrong, I'd be greatful for any tips! :)

@bentsku
Copy link
Contributor

bentsku commented Feb 20, 2019

Hello!

I just tried it too and same result with rejson-py. But I checked with the ReJSON CLI tool, to no avail. The problem stays the same. See the screenshot attached.
screenshot 2019-02-21 at 00 51 17

I think the problem comes from the ReJSON internal encoding, and not the Python client. Maybe you could check if there is an open issue there or open one to see if they could help you ?

@larswise
Copy link
Author

larswise commented Mar 2, 2019

I did manage to get around it:

In python I am able to restore the string by encoding as follows:
somevalue.encode('utf-8').decode('unicode-escape').encode('latin1').decode('utf-8')

and similarly in .NET after fetching with JSON.MGET

		public static string GetEncoded(params string[] strings)
		{
			var lat1 = System.Text.Encoding.GetEncoding("iso-8859-1");
			Regex rx = new Regex(@"\\[uU]([0-9A-Fa-f]{4})");
			var combined = string.Join(",", strings);
			var result = rx.Replace(combined, match => ((char)Int32.Parse(match.Value.Substring(2), System.Globalization.NumberStyles.HexNumber)).ToString());
			var lat1bytes = lat1.GetBytes(result);
			return System.Text.Encoding.UTF8.GetString(lat1bytes);
		}

@mschipperheyn
Copy link

Having problems as well.

JSON.SET foo . '"bãr"'
OK
JSON.GET foo .
"\"b\\u00c3\\u00a3r\""

When I remove the duplicate \ and decode the result bãr

@bentsku
Copy link
Contributor

bentsku commented Jul 22, 2019

I believe there is now an option to decode special character with a no-escape option in the JSON.GET command as said in the replies of this issue. Maybe we could add it as an option for the python command? I can try to add it if wanted.

RedisJSON/RedisJSON#98

@gkorland
Copy link
Contributor

@bentsku if you can submit a PR that will be great

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants