-
Notifications
You must be signed in to change notification settings - Fork 323
Description
Hi, I am getting an error while generating InferSent embeddings. The error is as follows, with details at the end of this email
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 11: invalid start byte
The error occurs after I run infer_sent_embs.build_vocab(x_train, tokenize=True) .
Note that I ran your code in Google Colab. Also note that the links to InferSent in the python file infersent.py also need to be updated (expired links).
The new links are
INFERSENT_GLOVE_MODEL_URL = 'https://dl.fbaipublicfiles.com/infersent/infersent1.pkl'
INFERSENT_FASTTEXT_MODEL_URL = 'https://dl.fbaipublicfiles.com/infersent/infersent2.pkl'
`
UnicodeDecodeError Traceback (most recent call last)
in ()
----> 1 infer_sent_embs.build_vocab(x_train, tokenize=True)
2 x_train_t = infer_sent_embs.encode(x_train, tokenize=True)
3 x_test_t = infer_sent_embs.encode(x_test, tokenize=True)
3 frames
/usr/lib/python3.6/codecs.py in decode(self, input, final)
319 # decode input (taking the buffer into account)
320 data = self.buffer + input
--> 321 (result, consumed) = self._buffer_decode(data, self.errors, final)
322 # keep undecoded input until the next call
323 self.buffer = data[consumed:]
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 11: invalid start byte
`