Instead of recomputing the embeddings, you can access the embeddings used in the paper through the following links. Note that sign flip was not applied to the ICA-transformed embeddings to ensure that the skewness of the axes remains positive.
Place the downloaded files under the directory output/raw_embeddings as shown below:
$ ls output/raw_embeddings/
raw_bert.pkl raw_glove.pkl raw_word2vec.pklPlace the downloaded files under the directory output/pca_ica_embeddings/ as shown below:
$ ls output/pca_ica_embeddings/
pca_ica_bert.pkl pca_ica_glove.pkl pca_ica_word2vec.pkl-
GloVe (Google Drive) for
$k=1,10,1000$ -
word2vec (Google Drive) for
$k=100$ -
BERT (Google Drive) for
$k=100$
Place the downloaded files under the directory output/axistour_embeddings/ as shown below:
$ ls output/axistour_embeddings/
axistour_top1000_glove.pkl axistour_top100_bert.pkl axistour_top100_glove.pkl axistour_top100_word2vec.pkl axistour_top10_glove.pkl axistour_top1_glove.pklPlace the downloaded files under the directory output/tica_embeddings/ as shown below:
$ ls output/tica_embeddings/
tica_width75_glove.pkl tica_width9_glove.pkl Download it from the following link:
Please place the data as in data/1-billion-word-language-modeling-benchmark-r13output/.
Make the data/embeddings/word2vec directory.
mkdir -p data/embeddings/word2vecThen download it from the following link:
GoogleNews-vectors-negative300.bin.gz (Google Cloud)
Please place the data as in data/embeddings/word2vec/GoogleNews-vectors-negative300.bin.
# word2vec
python save_raw_embeddings.py --emb_type word2vec
python save_pca_and_ica_embeddings.py --emb_type word2vec
python save_axistour_embeddings.py --emb_type word2vec --topk 100
# bert
python save_raw_embeddings.py --emb_type bert
python save_pca_and_ica_embeddings.py --emb_type bert
python save_axistour_embeddings.py --emb_type bert --topk 100If you are not using adjustText==1.0.4, you may need to manually adjust the position of the text.
python make_scatterplots.py --emb_type glove --topk 100 --left_axis_index 23 --length 9
python make_scatterplots.py --emb_type glove --topk 100 --left_axis_index 101 --length 9
python make_scatterplots.py --emb_type glove --topk 100 --left_axis_index 237 --length 9| (a) The 23rd axis to the 31st axis | (b) The 101st axis to the 109th axis | (c) The 237th axis to the 245th axis |
|---|---|---|
![]() |
![]() |
![]() |
python make_scatterplots.py --emb_type word2vec --topk 100 --left_axis_index 50 --length 10
python make_scatterplots.py --emb_type word2vec --topk 100 --left_axis_index 150 --length 10
python make_scatterplots.py --emb_type word2vec --topk 100 --left_axis_index 250 --length 10| (a) The 50th axis to 59th axis | (b) The 150th axis to 159th axis | (c) The 250th axis to 259th axis |
|---|---|---|
![]() |
![]() |
![]() |
python make_scatterplots.py --emb_type bert --topk 100 --left_axis_index 50 --length 10
python make_scatterplots.py --emb_type bert --topk 100 --left_axis_index 150 --length 10
python make_scatterplots.py --emb_type bert --topk 100 --left_axis_index 250 --length 10| (a) The 50th axis to 59th axis | (b) The 150th axis to 159th axis | (c) The 250th axis to 259th axis |
|---|---|---|
![]() |
![]() |
![]() |
python make_3d_figure.py --emb_type glove --topk 100 --start_axis_index 89python make_comparing_k.py --emb_type glovepython make_dimred_figure.py --emb_type glove --fig_type alphapython make_dimred_figure.py --emb_type glove --fig_type topkpython make_dimred_figure.py --emb_type glove --fig_type projectionpython save_tica_embeddings.py --emb_type glove --width 9
python save_tica_embeddings.py --emb_type glove --width 75If you are not using adjustText==1.0.4, you may need to manually adjust the position of the text.
python make_scatterplots_tica.py --emb_type glove --width 9 --left_axis_index 50 --length 10
python make_scatterplots_tica.py --emb_type glove --width 9 --left_axis_index 150 --length 10
python make_scatterplots_tica.py --emb_type glove --width 9 --left_axis_index 250 --length 10| (a) The 50th axis to the 59th axis | (b) The 150st axis to the 159th axis | (c) The 250th axis to the 259th axis |
|---|---|---|
![]() |
![]() |
![]() |
python make_scatterplots_tica.py --emb_type glove --width 75 --left_axis_index 50 --length 10
python make_scatterplots_tica.py --emb_type glove --width 75 --left_axis_index 150 --length 10
python make_scatterplots_tica.py --emb_type glove --width 75 --left_axis_index 250 --length 10| (a) The 50th axis to the 59th axis | (b) The 150st axis to the 159th axis | (c) The 250th axis to the 259th axis |
|---|---|---|
![]() |
![]() |
![]() |
python make_scatterplots.py --emb_type glove --topk 100 --left_axis_index 50 --length 10
python make_scatterplots.py --emb_type glove --topk 100 --left_axis_index 150 --length 10
python make_scatterplots.py --emb_type glove --topk 100 --left_axis_index 250 --length 10| (a) The 50th axis to the 59th axis | (b) The 150st axis to the 159th axis | (c) The 250th axis to the 259th axis |
|---|---|---|
![]() |
![]() |
![]() |
python make_cossim_histogram_and_scatterplot_tica.py --emb_type glove| Figure 21 | Figure 22 |
|---|---|
![]() |
![]() |
# Axis Tour
python eval_avg_d_I_and_avg_c_I.py
# TICA
python eval_avg_d_I_and_avg_c_I_tica.py --width 9
python eval_avg_d_I_and_avg_c_I_tica.py --width 75python make_higher_order_histogram.py --emb_type glovepython make_dimred_figure.py --emb_type glove --fig_type tica

























