Skip to content

Latest commit

 

History

History
330 lines (252 loc) · 11.3 KB

File metadata and controls

330 lines (252 loc) · 11.3 KB

Appendix

Preliminary

Access to the embeddings used in our paper

Instead of recomputing the embeddings, you can access the embeddings used in the paper through the following links. Note that sign flip was not applied to the ICA-transformed embeddings to ensure that the skewness of the axes remains positive.

Raw embeddings

Place the downloaded files under the directory output/raw_embeddings as shown below:

$ ls output/raw_embeddings/
raw_bert.pkl  raw_glove.pkl  raw_word2vec.pkl

PCA-transformed and ICA-transformed embeddings

Place the downloaded files under the directory output/pca_ica_embeddings/ as shown below:

$ ls output/pca_ica_embeddings/
pca_ica_bert.pkl  pca_ica_glove.pkl  pca_ica_word2vec.pkl

Axis Tour embedidngs

Place the downloaded files under the directory output/axistour_embeddings/ as shown below:

$ ls output/axistour_embeddings/
axistour_top1000_glove.pkl  axistour_top100_bert.pkl  axistour_top100_glove.pkl  axistour_top100_word2vec.pkl  axistour_top10_glove.pkl  axistour_top1_glove.pkl

TICA9-transformed and TICA75-transformed GloVe embeddings

Place the downloaded files under the directory output/tica_embeddings/ as shown below:

$ ls output/tica_embeddings/
tica_width75_glove.pkl  tica_width9_glove.pkl 

Download datasets to compute BERT embeddings for reproducibility experiments (in necessary)

One Billion Word Benchmark [3] for BERT embeddings

Download it from the following link:

Please place the data as in data/1-billion-word-language-modeling-benchmark-r13output/.

Download word2vec for reproducibility experiments (if necessary)

Make the data/embeddings/word2vec directory.

mkdir -p data/embeddings/word2vec

Then download it from the following link:

GoogleNews-vectors-negative300.bin.gz (Google Cloud)

Please place the data as in data/embeddings/word2vec/GoogleNews-vectors-negative300.bin.


Code

Save embeddings for reproducibility experiments

# word2vec
python save_raw_embeddings.py --emb_type word2vec
python save_pca_and_ica_embeddings.py --emb_type word2vec
python save_axistour_embeddings.py --emb_type word2vec --topk 100

# bert
python save_raw_embeddings.py --emb_type bert
python save_pca_and_ica_embeddings.py --emb_type bert
python save_axistour_embeddings.py --emb_type bert --topk 100

Scatterplots

If you are not using adjustText==1.0.4, you may need to manually adjust the position of the text.

glove

python make_scatterplots.py --emb_type glove --topk 100 --left_axis_index 23 --length 9
python make_scatterplots.py --emb_type glove --topk 100 --left_axis_index 101 --length 9
python make_scatterplots.py --emb_type glove --topk 100 --left_axis_index 237 --length 9
(a) The 23rd axis to the 31st axis (b) The 101st axis to the 109th axis (c) The 237th axis to the 245th axis
fig. 6a fig. 6b fig. 6c

word2vec

python make_scatterplots.py --emb_type word2vec --topk 100 --left_axis_index 50 --length 10
python make_scatterplots.py --emb_type word2vec --topk 100 --left_axis_index 150 --length 10
python make_scatterplots.py --emb_type word2vec --topk 100 --left_axis_index 250 --length 10
(a) The 50th axis to 59th axis (b) The 150th axis to 159th axis (c) The 250th axis to 259th axis
fig. 6a fig. 6b fig. 6c

BERT

python make_scatterplots.py --emb_type bert --topk 100 --left_axis_index 50 --length 10
python make_scatterplots.py --emb_type bert --topk 100 --left_axis_index 150 --length 10
python make_scatterplots.py --emb_type bert --topk 100 --left_axis_index 250 --length 10
(a) The 50th axis to 59th axis (b) The 150th axis to 159th axis (c) The 250th axis to 259th axis
fig. 6a fig. 6b fig. 6c

3D projection

python make_3d_figure.py --emb_type glove --topk 100 --start_axis_index 89
fig.7

Comparing $k$

python make_comparing_k.py --emb_type glove
fig.9

Dimensionality reduction

Comparison of $\alpha$

python make_dimred_figure.py --emb_type glove --fig_type alpha
fig.8

Comparison of $k$

python make_dimred_figure.py --emb_type glove --fig_type topk
fig.10

Skewness Sort Projection and Random Order Projection

python make_dimred_figure.py --emb_type glove --fig_type projection
fig.10

TICA

Save embeddings for reproducibility experiments

python save_tica_embeddings.py --emb_type glove --width 9
python save_tica_embeddings.py --emb_type glove --width 75

Scatterplots

If you are not using adjustText==1.0.4, you may need to manually adjust the position of the text.

TICA9
python make_scatterplots_tica.py --emb_type glove --width 9 --left_axis_index 50 --length 10
python make_scatterplots_tica.py --emb_type glove --width 9 --left_axis_index 150 --length 10
python make_scatterplots_tica.py --emb_type glove --width 9 --left_axis_index 250 --length 10
(a) The 50th axis to the 59th axis (b) The 150st axis to the 159th axis (c) The 250th axis to the 259th axis
fig. 20a fig. 20b fig. 20c
TICA75
python make_scatterplots_tica.py --emb_type glove --width 75 --left_axis_index 50 --length 10
python make_scatterplots_tica.py --emb_type glove --width 75 --left_axis_index 150 --length 10
python make_scatterplots_tica.py --emb_type glove --width 75 --left_axis_index 250 --length 10
(a) The 50th axis to the 59th axis (b) The 150st axis to the 159th axis (c) The 250th axis to the 259th axis
fig. 20a fig. 20b fig. 20c
glove
python make_scatterplots.py --emb_type glove --topk 100 --left_axis_index 50 --length 10
python make_scatterplots.py --emb_type glove --topk 100 --left_axis_index 150 --length 10
python make_scatterplots.py --emb_type glove --topk 100 --left_axis_index 250 --length 10
(a) The 50th axis to the 59th axis (b) The 150st axis to the 159th axis (c) The 250th axis to the 259th axis
fig. 20a fig. 20b fig. 20c

Histograms and scatterplot of cossim

python make_cossim_histogram_and_scatterplot_tica.py --emb_type glove
Figure 21 Figure 22
fig. 21 fig. 22

Avg. $d_I$ and Avg. $c_I$

# Axis Tour
python eval_avg_d_I_and_avg_c_I.py
# TICA
python eval_avg_d_I_and_avg_c_I_tica.py --width 9
python eval_avg_d_I_and_avg_c_I_tica.py --width 75

Histograms of higher-order correlation

python make_higher_order_histogram.py --emb_type glove
fig.24

Dimensionality reduction

python make_dimred_figure.py --emb_type glove --fig_type tica
fig.23