Chapter 11, part 2, cannot load Glove

When trying to run the code for preparing Glove word-embeddings matrix, towards the end of the notebook for chapter 11, part 2, I get an error:
embedding_dim = 100

# Retrieve the vocabulary indexed by our previous TextVectorization layer.
vocabulary = text_vectorization.get_vocabulary()
# Use it to create a mapping from words to their index in the vocabulary.
word_index = dict(zip(vocabulary, range(len(vocabulary))))

# Prepare a matrix that will be filled with the GloVe vectors.
embedding_matrix = np.zeros((max_tokens, embedding_dim))
for word, i in word_index.items():
    if i < max_tokens:
        embedding_vector = embeddings_index.get(word)
    # Fill entry i in the matrix with the word vector for index i. 
    # Words not found in the embedding index will be all zeros.
    if embedding_vector is not None:
        embedding_matrix[i] = embedding_vector
UnicodeDecodeError                        Traceback (most recent call last)
Input In [29], in <cell line: 4>()
      1 embedding_dim = 100
      3 # Retrieve the vocabulary indexed by our previous TextVectorization layer.
----> 4 vocabulary = text_vectorization.get_vocabulary()
      5 # Use it to create a mapping from words to their index in the vocabulary.
      6 word_index = dict(zip(vocabulary, range(len(vocabulary))))

File C:\ProgramData\Anaconda3\envs\tf-gpu\lib\site-packages\keras\layers\preprocessing\text_vectorization.py:448, in TextVectorization.get_vocabulary(self, include_special_tokens)
    439 def get_vocabulary(self, include_special_tokens=True):
    440   """Returns the current vocabulary of the layer.
    441 
    442   Args:
   (...)
    446       vocabulary will not include any padding or OOV tokens.
    447   """
--> 448   return self._lookup_layer.get_vocabulary(include_special_tokens)

File C:\ProgramData\Anaconda3\envs\tf-gpu\lib\site-packages\keras\layers\preprocessing\index_lookup.py:336, in IndexLookup.get_vocabulary(self, include_special_tokens)
    334   keys, values = self.lookup_table.export()
    335   vocab, indices = (values, keys) if self.invert else (keys, values)
--> 336   vocab, indices = (self._tensor_vocab_to_numpy(vocab), indices.numpy())
    337 lookup = collections.defaultdict(lambda: self.oov_token,
    338                                  zip(indices, vocab))
    339 vocab = [lookup[x] for x in range(self.vocabulary_size())]

File C:\ProgramData\Anaconda3\envs\tf-gpu\lib\site-packages\keras\layers\preprocessing\string_lookup.py:401, in StringLookup._tensor_vocab_to_numpy(self, vocabulary)
    399 def _tensor_vocab_to_numpy(self, vocabulary):
    400   vocabulary = vocabulary.numpy()
--> 401   return np.array([tf.compat.as_text(x, self.encoding) for x in vocabulary])

File C:\ProgramData\Anaconda3\envs\tf-gpu\lib\site-packages\keras\layers\preprocessing\string_lookup.py:401, in <listcomp>(.0)
    399 def _tensor_vocab_to_numpy(self, vocabulary):
    400   vocabulary = vocabulary.numpy()
--> 401   return np.array([tf.compat.as_text(x, self.encoding) for x in vocabulary])

File C:\ProgramData\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\util\compat.py:110, in as_text(bytes_or_text, encoding)
    108   return bytes_or_text
    109 elif isinstance(bytes_or_text, bytes):
--> 110   return bytes_or_text.decode(encoding)
    111 else:
    112   raise TypeError('Expected binary or unicode string, got %r' % bytes_or_text)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 0: unexpected end of data



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chapter 11, part 2, cannot load Glove #214

Retrieve the vocabulary indexed by our previous TextVectorization layer.

Use it to create a mapping from words to their index in the vocabulary.

Prepare a matrix that will be filled with the GloVe vectors.

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Chapter 11, part 2, cannot load Glove #214

Description

Retrieve the vocabulary indexed by our previous TextVectorization layer.

Use it to create a mapping from words to their index in the vocabulary.

Prepare a matrix that will be filled with the GloVe vectors.

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions