Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precision@1: 0.00, @5: 0.00, @10: 0.00 #2

Open
bharathichezhiyan opened this issue Nov 20, 2019 · 2 comments
Open

Precision@1: 0.00, @5: 0.00, @10: 0.00 #2

bharathichezhiyan opened this issue Nov 20, 2019 · 2 comments

Comments

@bharathichezhiyan
Copy link

Hi,
I ran BLISS your experiment with the pretrained embedding download from ttps://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.en.vec (English and Tamil) for two languages. It is working great. I gettting good result.

But when I try to create my embedding by downloading wikidump xml (https://dumps.wikimedia.org/tawiki/latest/tawiki-latest-pages-articles.xml.bz2),
extracting to text (using http://wiki.apertium.org/wiki/Wikipedia_Extractor),
train word embedding using fasttext (https://fasttext.cc/docs/en/unsupervised-tutorial.html) [$ ./fasttext skipgram -input input_data_loc/wikidump_ta.txt -output result/ta -dim 300]

After this I am getting "0" precisions for some languages, for some languages error: Dimension out of range (expected to be in range of [-1, 0], but got 1)

Here is the one of the exection results,

Loading faiss with AVX2 support.
2019-11-20 09:53:10,122: INFO: data params
data_dir: ./muse_data/
languages: [{'filename': 'wiki.en.vec', 'name': 'en'}, {'filename': 'ta.vec', 'name': 'ta'}]
mean_center: False
mode: rand
output_dir: ./output/en-ta
supervised: {'fname': 'en-ta.0-5000.txt', 'max_freq': -1}
unit_norm: True
unsupervised: True
save_dir: ./output/en-ta/run-13
2019-11-20 09:53:10,123: INFO: generator parameters
embed_dim: 300
init: eye
2019-11-20 09:53:10,123: INFO: discriminator parameters
dropout_prob: 0.1
embed_dim: 300
hidden_dim: 2048
max_freq: 75000
2019-11-20 09:53:10,123: INFO: GAN parameters
src: en
tgt: ta
2019-11-20 09:53:10,123: INFO: Training Parameters
batch_sz: 32
epochs: 200
eval_batches: 500
factor: {'ortho': 1.0, 'sup': 1.0, 'unsup': 1.0}
iters_per_epoch: 5000
k: 10
log_after: 500
lr_decay: 0.98
lr_local_dk: 0.5
num_disc_rounds: 5
num_gen_rounds: 1
num_nbrs: 100000
num_supervised_rounds: 1
opt: SGD
opt_params: {'lr': 0.1}
ortho_params: {'ortho_type': 'none'}
orthogonal: auto_loss
patience: 2
procrustes_dict_size: 0
procrustes_iters: 3
procrustes_tgt_rank: 15000
procrustes_thresh: 0.0
smoothing: 0.1
eval_metric: unsupervised
sup_opt: SGD
supervised_method: rcsls
2019-11-20 09:53:16,774: INFO: Unit Norming
2019-11-20 09:53:17,026: INFO: Unit Norming
Traceback (most recent call last):
File "/home/bharaj/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/bharaj/anaconda3/lib/python3.6/runpy.py", line 85, in run_code
exec(code, run_globals)
File "/home/bharaj/third_work/BLISS/bliss/main.py", line 86, in
lang.load(w['filename'], data_dir, max_freq=75000)
File "/home/bharaj/third_work/BLISS/bliss/data/data.py", line 104, in load
self.embeddings.div
(self.embeddings.norm(2, 1, keepdim=True))
File "/home/bharaj/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 253, in norm
return torch.norm(self, p, dim, keepdim, dtype=dtype)
File "/home/bharaj/anaconda3/lib/python3.6/site-packages/torch/functional.py", line 705, in norm
return torch._C._VariableFunctions.norm(input, p, dim, keepdim=keepdim)
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

@codedecde
Copy link

Hi.
My hunch is that there is something weird about the embedding files ':)
Is there a way you can share them, so that we can take a look ?

Thank you !

@bharathichezhiyan
Copy link
Author

I have shared the embedding files. (https://drive.google.com/drive/folders/1rw5LEQaTHPq6GV9hDYY5MtJfxPWmMlpL?usp=sharing) Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants