Precision@1: 0.00, @5: 0.00, @10: 0.00 #2

bharathichezhiyan · 2019-11-20T09:55:57Z

Hi,
I ran BLISS your experiment with the pretrained embedding download from ttps://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.en.vec (English and Tamil) for two languages. It is working great. I gettting good result.

But when I try to create my embedding by downloading wikidump xml (https://dumps.wikimedia.org/tawiki/latest/tawiki-latest-pages-articles.xml.bz2),
extracting to text (using http://wiki.apertium.org/wiki/Wikipedia_Extractor),
train word embedding using fasttext (https://fasttext.cc/docs/en/unsupervised-tutorial.html) [$ ./fasttext skipgram -input input_data_loc/wikidump_ta.txt -output result/ta -dim 300]

After this I am getting "0" precisions for some languages, for some languages error: Dimension out of range (expected to be in range of [-1, 0], but got 1)

Here is the one of the exection results,

Loading faiss with AVX2 support.
2019-11-20 09:53:10,122: INFO: data params
data_dir: ./muse_data/
languages: [{'filename': 'wiki.en.vec', 'name': 'en'}, {'filename': 'ta.vec', 'name': 'ta'}]
mean_center: False
mode: rand
output_dir: ./output/en-ta
supervised: {'fname': 'en-ta.0-5000.txt', 'max_freq': -1}
unit_norm: True
unsupervised: True
save_dir: ./output/en-ta/run-13
2019-11-20 09:53:10,123: INFO: generator parameters
embed_dim: 300
init: eye
2019-11-20 09:53:10,123: INFO: discriminator parameters
dropout_prob: 0.1
embed_dim: 300
hidden_dim: 2048
max_freq: 75000
2019-11-20 09:53:10,123: INFO: GAN parameters
src: en
tgt: ta
2019-11-20 09:53:10,123: INFO: Training Parameters
batch_sz: 32
epochs: 200
eval_batches: 500
factor: {'ortho': 1.0, 'sup': 1.0, 'unsup': 1.0}
iters_per_epoch: 5000
k: 10
log_after: 500
lr_decay: 0.98
lr_local_dk: 0.5
num_disc_rounds: 5
num_gen_rounds: 1
num_nbrs: 100000
num_supervised_rounds: 1
opt: SGD
opt_params: {'lr': 0.1}
ortho_params: {'ortho_type': 'none'}
orthogonal: auto_loss
patience: 2
procrustes_dict_size: 0
procrustes_iters: 3
procrustes_tgt_rank: 15000
procrustes_thresh: 0.0
smoothing: 0.1
eval_metric: unsupervised
sup_opt: SGD
supervised_method: rcsls
2019-11-20 09:53:16,774: INFO: Unit Norming
2019-11-20 09:53:17,026: INFO: Unit Norming
Traceback (most recent call last):
File "/home/bharaj/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/bharaj/anaconda3/lib/python3.6/runpy.py", line 85, in run_code
exec(code, run_globals)
File "/home/bharaj/third_work/BLISS/bliss/main.py", line 86, in
lang.load(w['filename'], data_dir, max_freq=75000)
File "/home/bharaj/third_work/BLISS/bliss/data/data.py", line 104, in load
self.embeddings.div(self.embeddings.norm(2, 1, keepdim=True))
File "/home/bharaj/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 253, in norm
return torch.norm(self, p, dim, keepdim, dtype=dtype)
File "/home/bharaj/anaconda3/lib/python3.6/site-packages/torch/functional.py", line 705, in norm
return torch._C._VariableFunctions.norm(input, p, dim, keepdim=keepdim)
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

codedecde · 2019-11-21T15:29:53Z

Hi.
My hunch is that there is something weird about the embedding files ':)
Is there a way you can share them, so that we can take a look ?

Thank you !

bharathichezhiyan · 2019-11-22T12:09:17Z

I have shared the embedding files. (https://drive.google.com/drive/folders/1rw5LEQaTHPq6GV9hDYY5MtJfxPWmMlpL?usp=sharing) Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Precision@1: 0.00, @5: 0.00, @10: 0.00 #2

Precision@1: 0.00, @5: 0.00, @10: 0.00 #2

bharathichezhiyan commented Nov 20, 2019

codedecde commented Nov 21, 2019

bharathichezhiyan commented Nov 22, 2019

Precision@1: 0.00, @5: 0.00, @10: 0.00 #2

Precision@1: 0.00, @5: 0.00, @10: 0.00 #2

Comments

bharathichezhiyan commented Nov 20, 2019

codedecde commented Nov 21, 2019

bharathichezhiyan commented Nov 22, 2019