You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
W_emb=model.add_lookup_parameters((nwords, EMB_SIZE)) # Word embeddings
However, there are likely many new words in dev/test set that might be added in w2i... their values are mapped to UNK, but they are still counted in len(w2i) which is likely not intended. Often this overcounting does not change the results, but it can be problematic in some cases.
The text was updated successfully, but these errors were encountered:
Most files share similar data reading code, like
nn4nlp-code/01-intro/cbow.py
Lines 18 to 22 in a9e8be5
In most of the examples, the variable
nwords
is used as the effective vocabulary size, for instance, when we allocate parameters for embedding matrix.nn4nlp-code/01-intro/cbow.py
Line 30 in a9e8be5
However, there are likely many new words in dev/test set that might be added in
w2i
... their values are mapped toUNK
, but they are still counted inlen(w2i)
which is likely not intended. Often this overcounting does not change the results, but it can be problematic in some cases.The text was updated successfully, but these errors were encountered: