Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError parsing results #312

Open
tulsidas opened this issue Jan 22, 2020 · 9 comments
Open

UnicodeDecodeError parsing results #312

tulsidas opened this issue Jan 22, 2020 · 9 comments
Labels

Comments

@tulsidas
Copy link

I get the following error when piping to fpp, apparently some files are binary and it's breaking

Traceback (most recent call last):
  File "/home/tulsi/PathPicker/src/processInput.py", line 84, in <module>
    doProgram(flags)
  File "/home/tulsi/PathPicker/src/processInput.py", line 53, in doProgram
    lineObjs = getLineObjs(flags)
  File "/home/tulsi/PathPicker/src/processInput.py", line 20, in getLineObjs
    inputLines = sys.stdin.readlines()
  File "/usr/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 1191: invalid start byte

python version: 2.7.17

@pcottle
Copy link
Contributor

pcottle commented Jan 24, 2020 via email

@tulsidas
Copy link
Author

Can you try python3?

Same error :(

$ python --version
Python 3.7.6

Traceback (most recent call last):
File "/home/tulsi/PathPicker/src/processInput.py", line 84, in
doProgram(flags)
File "/home/tulsi/PathPicker/src/processInput.py", line 53, in doProgram
lineObjs = getLineObjs(flags)
File "/home/tulsi/PathPicker/src/processInput.py", line 20, in getLineObjs
inputLines = sys.stdin.readlines()
File "/usr/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 4397: invalid start byte

@pcottle
Copy link
Contributor

pcottle commented Jan 24, 2020

Ah ok that's a helpful piece of context. Do you mind adding the most minimal stdout that repros the issue? I presume its some kind of binary output that's piped straight into fpp?

@tulsidas
Copy link
Author

Indeed it's binary ouput being piped directly. I don't know if this is useful, but it's a use case

$ grep dii
95203:�}X�d�ܗ;���5���-\0{����e�D�8�`�+ڒ�K���"�NX�)�
�R��@�s}W�����s}ldii�tڑ�v6x&EZa����ž��S�=a�x}
218194:���wxC�O]��n��'5�{����	��
�y�-�^떟����`��B�m�->8\���~����б����t}�����<D6@,��̞3a���q�pQ�$���~�9<�Vz/흩����.SX2��켨�L'�=ڒ��_��􄢛cj���;�w��o�n����2sY�H��1��g��G̤c	뎱��K�V?A�)ɯ�/2R'��bng��ݚ4H
                                   AeKJ�1V�K�0�ޗz��``�s|�DT`R�{>:ń�]��x�v�	�za��~9�ұ/0oR�����Y�R�MX:ؤ���L�)�v���m���=[oQ�-U	2ex=ֵ�dV
               g/I;�^Ъƺ�mϭ�����1�n�����z�      �k#p�I� ����ĭ�xT�2'��A�	�/�Ӽa�<@dii�Q����@,H������.sɰi2A9u��t�N�[�ʡ1c@����vk�A4��%�����~1��_O\��BB�o�9ΈB�����?�0h���]j>$3/��lu7��y����W�����àa&��$aQ1�,3u�(�K�q�-�O�
                                                                                        �w;=���*'է53?1$LTë��9��{�]m����D7|G;�.O2p9K%�H8�����b0EOq
                     C�t���O�

same output if piped to fpp throws the previously mentioned error

@pcottle
Copy link
Contributor

pcottle commented Jan 27, 2020

Hrm I wasn't able to reproduce, but I imagine some of the binary data got lost between our pastes:

[pcottle:~:]$ cat foo.txt 
95203:�}X�d�ܗ;���5���-\0{����e�D�8�`�+ڒ�K���"�NX�)�
�R��@�s}W�����s}ldii�tڑ�v6x&EZa����ž��S�=a�x}
218194:���wxC�O]��n��'5�{����	��
�y�-�^떟����`��B�m�->8\���~����б����t}�����<D6@,��̞3a���q�pQ�$���~�9<�Vz/흩����.SX2��켨�L'�=ڒ��_��􄢛cj���;�w��o�n����2sY�H��1��g��G̤c	뎱��K�V?A�)ɯ�/2R'��bng��ݚ4H
                                   AeKJ�1V�K�0�ޗz��``�s|�DT`R�{>:ń�]��x�v�	�za��~9�ұ/0oR�����Y�R�MX:ؤ���L�)�v���m���=[oQ�-U	2ex=ֵ�dV
               g/I;�^Ъƺ�mϭ�����1�n�����z�      �k#p�I� ����ĭ�xT�2'��A�	�/�Ӽa�<@dii�Q����@,H������.sɰi2A9u��t�N�[�ʡ1c@����vk�A4��%�����~1��_O\��BB�o�9ΈB�����?�0h���]j>$3/��lu7��y����W�����àa&��$aQ1�,3u�(�K�q�-�O�
                                                                                        �w;=���*'է53?1$LTë��9��{�]m����D7|G;�.O2p9K%�H8�����b0EOq
                     C�t���O�[pcottle:~:]$ cat foo.txt  | fpp
No lines matched!!

@tulsidas
Copy link
Author

Yes, there are surely many non-printable characters.

I was able to create a simple file, since the error states that it can parse the char 0x8b

See attached a.txt file

cat a.txt | fpp

reproduces the error

@pcottle
Copy link
Contributor

pcottle commented Jan 29, 2020

Hrm the file input has an errors parameter:
https://stackoverflow.com/questions/35028683/python3-unicodedecodeerror-with-readlines-method/41652865

do you know if there's an equivalent for sys.stdin.readlines? Seems like that would solve our issues

@tulsidas
Copy link
Author

I know very little python, apparently the trick is to read from stdin in binary mode, like this

@pcottle
Copy link
Contributor

pcottle commented Jan 29, 2020 via email

@KapJI KapJI added the bug label Feb 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants