-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle unicode and str in exporters for py2 #274
base: master
Are you sure you want to change the base?
Handle unicode and str in exporters for py2 #274
Conversation
Before this commit, in py2 only bytes strings were exported and in py3 only unicode strings were exported.
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here (e.g. What to do if you already signed the CLAIndividual signers
Corporate signers
|
Hey, I signed the CLA :) |
CLAs look good, thanks! |
I know the tests don't pass on py3 but before refining I'd like a validation that I understood correctly the goal and I'm not heading in the wrong direction. |
Thanks for the PR @guewen. Using the In general though this problem is a big can of worms, and this PR makes it clear that we need to be more careful about internal use of the If I understand correctly: it looks like the library assumes string-valued attribute values are always So in python 2.x, >>> type(b'abc')
str
>>> type(b'abc'.decode('utf-8'))
unicode
>>> isinstance(b'abc'.decode('utf-8'), str)
False And in python 3.x >>> type(b'abc')
bytes
>>> type(b'abc'.decode('utf-8'))
str We have to support both versions of python, and have to support non-ASCII characters in attribute values. But the spec also says to truncate these strings to 256 bytes without specifying an encoding. In 2/3 decoding a byte string with any valid encoding gets you a |
Which is all to say: the direction looks good, but there may be some unintended consequences. |
Thanks for your detailed answer, particularly, I wasn't aware of the 256 bytes truncation (new to the subject). |
Before this commit, in py2 only bytes strings were exported and in py3
only unicode strings were exported.
I'm not sure I'm doing it right, that's at least an opening for a discussion.
Fixes #273