Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position X: character maps to <undefined> #770

Closed
guilhermeferrari opened this issue Apr 26, 2021 · 4 comments · Fixed by #774
Milestone

Comments

@guilhermeferrari
Copy link

OS: Windows
Pytext Version: Tried with 1.10.0 and 1.11.1, error stack on 1.11.1

I am having this error when trying to use the pre-commit integration and trying to pair manually with the cli command:

$ jupytext --set-formats ipynb,py NOTEBOOKPATH
Traceback (most recent call last):
  File "c:\users\bruno.jsilveira\appdata\local\programs\python\python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\bruno.jsilveira\appdata\local\programs\python\python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\bruno.jsilveira\.virtualenvs\riscops_monitoring-UcxYZbWD\Scripts\jupytext.exe\__main__.py", line 7, in <module>
  File "c:\users\bruno.jsilveira\.virtualenvs\riscops_monitoring-ucxyzbwd\lib\site-packages\jupytext\cli.py", line 427, in jupytext
  File "c:\users\bruno.jsilveira\.virtualenvs\riscops_monitoring-ucxyzbwd\lib\site-packages\jupytext\cli.py", line 828, in jupytext_single_file
    write_pair(nb_file, formats, write_function)
  File "c:\users\bruno.jsilveira\.virtualenvs\riscops_monitoring-ucxyzbwd\lib\site-packages\jupytext\pairs.py", line 31, in write_pair
    value = write_one_file(alt_path, fmt)
  File "c:\users\bruno.jsilveira\.virtualenvs\riscops_monitoring-ucxyzbwd\lib\site-packages\jupytext\cli.py", line 821, in write_function
    lazy_write(
  File "c:\users\bruno.jsilveira\.virtualenvs\riscops_monitoring-ucxyzbwd\lib\site-packages\jupytext\cli.py", line 741, in lazy_write
    current_content = fp.read()
  File "c:\users\bruno.jsilveira\appdata\local\programs\python\python39\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 29343726: character maps to <undefined>

I fixed it by adding the parameter encoding="utf-8" to cli.py line 740.
https://github.com/mwouts/jupytext/blob/master/jupytext/cli.py#L740

...
  with open(path, encoding="utf-8") as fp:
      current_content = fp.read()
  modified = new_content != current_content
  if modified and args.diff:
...

Is that a correct fix? Or could it be something wrong with my notebook?

PS: On Linux and MacOS it works fine, just having this problem on windows.

@mwouts mwouts added this to the 1.11.2 milestone Apr 26, 2021
@mwouts
Copy link
Owner

mwouts commented Apr 26, 2021

Hi @guilhermeferrari , thank you for reporting this. Indeed I see that some of our calls to open are with an explicit encoding="utf-8" parameters, some others without... We should normalize that (and add the encoding parameter everywhere). I'll have a look soon.

@me-suzy
Copy link

me-suzy commented Sep 26, 2022

I find another solution, HERE, maybe will help:

# -*- coding: utf-8 -*-
import re
KEYWORD = u"英語"
URL = u"http://www.google.com/"
content = u"和製英語(わせいえいご)とは、日本で作られた英語風の日本語語彙のことである。"
p=re.compile(u'('+ KEYWORD +u')(?!(([^<>]*?)>)|([^>]*?</a>))',re.UNICODE)
print p.sub(u'<a href="'+ URL +'">\1</a>',content)
print p.sub(u'<a href="'+ URL +r'">\1</a>',content)

@me-suzy
Copy link

me-suzy commented Sep 26, 2022

Or, another solution:

def read_text_from_file(file_path):

    with open(file_path, encoding='utf8') as f:
        text = f.read()
        return text

def write_to_file(text, file_path):

    with open(file_path, 'wb') as f:
        f.write(text.encode('utf8', 'ignore'))
with open(path, encoding="utf-8") as f:

@BrunoCiccarino
Copy link

In my case it was a pip installation error, how can I proceed from here? I have already tried setting the encoding to utf-8 and latin-1 but the error still persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants