Skip to content

InferSent error (help needed) #5

@happypanda5

Description

@happypanda5

Hi, I am getting an error while generating InferSent embeddings. The error is as follows, with details at the end of this email

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 11: invalid start byte

The error occurs after I run infer_sent_embs.build_vocab(x_train, tokenize=True) .

Note that I ran your code in Google Colab. Also note that the links to InferSent in the python file infersent.py also need to be updated (expired links).

The new links are

INFERSENT_GLOVE_MODEL_URL = 'https://dl.fbaipublicfiles.com/infersent/infersent1.pkl'
INFERSENT_FASTTEXT_MODEL_URL = 'https://dl.fbaipublicfiles.com/infersent/infersent2.pkl'

`

UnicodeDecodeError Traceback (most recent call last)
in ()
----> 1 infer_sent_embs.build_vocab(x_train, tokenize=True)
2 x_train_t = infer_sent_embs.encode(x_train, tokenize=True)
3 x_test_t = infer_sent_embs.encode(x_test, tokenize=True)

3 frames
/usr/lib/python3.6/codecs.py in decode(self, input, final)
319 # decode input (taking the buffer into account)
320 data = self.buffer + input
--> 321 (result, consumed) = self._buffer_decode(data, self.errors, final)
322 # keep undecoded input until the next call
323 self.buffer = data[consumed:]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 11: invalid start byte
`

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions