Procrustes Embed

Small project on algorithms for piecewise orthogonal alignment of vectors and an application to vector embeddings. The main contribution is a dynamic-programming algorithm which finds the optimal clustering (along a given 1-dimensional ordering of the data) and cluster-wise orthogonal transformations to minimize the squared alignment error. See writeup.

Visualization of the DP algorithm:

Setup

Install dependencies via pip install -r requirements.txt.

Experiment on synthetic data

experiments/upstream/main_evaluation.py, generates a synthetic temporal-shift dataset that is favorable to piecewise alignment and compares baseline approaches (global_procrustes, kmeans, with the DP solver).

python experiments/upstream/main_evaluation.py

Downstream applications

experiments/downstream/partial_upgrade.py evaluates a partial upgrade: documents are indexed with Model A, queries come from Model B, and we use vector alignment to learn a map from Model B to Model A that we apply before retrieval. The script reports Recall on held-out pairs and train/test residuals.

python experiments/downstream/partial_upgrade.py

Fast nuclear norm approximation

The DP solver performs a quadratic number of nuclear norm computations. By default, the nuclear norm is computed exactly, but we also include a fast approximation of the nuclear norm using Stochastic Lanczos Quadrature (SLQ). Pass --nvecs or --steps to tune the parameters for fast approximation.

Project layout

algorithms/: implementation of (baseline_procrustes, piecewise_procrustes, fast_nuclear_norm)
experiments/upstream/: Alignment error and runtime on a synthetic dataset + algo visualization
experiments/downstream/: a retrieval task where vector alignment improves accuracy

Future directions

Runtime. Save on the $n^2$ runtime factor with a better algorithmic idea (e.g., sketching the Frobenius norm as a proxy)
Quality. Find a setting / downstream task where piecewise_procrustes can yield substantially better results than naive methods -- currently, it doesn't look too promising

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
algorithms		algorithms
experiments		experiments
.gitignore		.gitignore
README.md		README.md
piecewise-alignment-embeddings.pdf		piecewise-alignment-embeddings.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Procrustes Embed

Setup

Experiment on synthetic data

Downstream applications

Fast nuclear norm approximation

Project layout

Future directions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Procrustes Embed

Setup

Experiment on synthetic data

Downstream applications

Fast nuclear norm approximation

Project layout

Future directions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages