Skip to content

Conversation

@luciaquirke
Copy link
Collaborator

@luciaquirke luciaquirke commented Dec 14, 2025

More VRAM efficient variant where preconditioners can be spread across an arbitrary number of nodes to compute large outer products. This is useful because preconditioners are often applied to a query and then the query is run across a large dataset, so slow but VRAM-efficient preconditioner computation and usage is a scalable pattern.

Because the preconditioners don't necessarily fit on a single GPU we use GLOO to do distributed CPU operations.

@luciaquirke luciaquirke force-pushed the trackstar-run branch 3 times, most recently from ea0996a to 369dc0d Compare December 15, 2025 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants