BP Reordering Codec #15430

msokolov · 2025-11-17T16:31:06Z

This is a stab at implementing the previously-published BpVectorReorderer as part of the codec. The version we have now can be used as a merge policy to reorder docids by doing BP over a vector field. This version allows for the ordering to be done at the vector ordinal level for each field independently while docids may be soprted by some other IndexSort.

The code is fully working as far as I can tell, and I think the design is reasonable, but some TODOs and nocommits remain:

Test results have so far been underwhelming, and I plan to continue to work to see if it is producing results thatare on par with the merge policy version. There might be a bug somewhere, and if anyone is interested in looking, I'd appreciate feedback.
There isn't a convenient way to enable/disable reordering in tests. Currently a few tests will fail (the TestBpVectorReorder in particular since it is reordering both at the codec level and separately in the merge policy.

Use BpVectorReorderer to reorder when merging

…ordering in codec

msokolov · 2025-12-01T14:15:21Z

OK after finally having ironed out the bugs, I have some results. The situation is a little complicated as the change here really doesn't help much with the typical "dense" index where every document has a vector. I think the reason is that any gains are masked by the additional cost of having a node->doc mapping that must be traversed. On the other hand, in the "sparse" case where some documents have no vectors, we already have such a mapping, so we can see the impact of this change more clearly. Net/net we see improvements in search latency, increasing with index size. On indexes of 1-2MM I see 5% improvement, on 10MM, a 10% improvement. As expected, vex files show a decrease in size (about 15%). There is also an increase in vem since that is where we store the new node->doc mapping, but this is pretty small. Merge times go up a lot - this metric varies quite a bit, but seems to be about 100% increase.

It may be possible to reduce the merge times by tweaking the parameters of the BP execution to make it recurse less? I'll see if I can do that while retaining the latency improvements. Then it might be best to enable this only for sparse indexes.

msokolov · 2025-12-01T14:56:12Z

luceneutil results from one run with 10MM cohere 768-dim wiki documents, with a 50% selectivity filter (I made some changes to luceneutil to be able to create an index with 50% "empty" documents):

recall  latency(ms)  netCPU  avgCpuCount  visited  index(s)  index_docs/s  force_merge(s)  index_size(MB)  bp-reorder
 0.923        3.350   3.338        0.996     5990   2754.51       3630.41         2283.57        15169.98       false
 0.921        3.052   3.042        0.997     5691   2858.65       3498.15         4312.79        15123.17        true

It's a little surprising we see recall changes, since the graphs should be the same, merely reordered? They seem to be sometimes worse, sometimes better, and never more than about 0.002 different

dweiss · 2025-12-01T15:28:45Z

They seem to be sometimes worse, sometimes better, and never more than about 0.002 different

Could it be accumulated floating point errors?

benwtrent · 2025-12-01T15:43:15Z

It's a little surprising we see recall changes, since the graphs should be the same, merely reordered?

If a graph builds with seeing the vectors in a different order (which can happen in concurrent merging), I would expect slight recall deviations on lower ef_search parameters.

msokolov · 2025-12-02T12:29:46Z

I've done many runs, and for the most part I'm seeing identical recall, so I think we can safely ignore these occasional variants. It probably is down to order differences from concurrent indexing, as @benwtrent says.

I can also report that the merge time difference is greatly reduced if I enable concurrent reordering by passing the mergestate intra-merge executor in. EG:

recall  latency(ms)  netCPU  avgCpuCount  visited  index(s)  index_docs/s  force_merge(s)  index_size(MB)  bp-reorder
 0.937        3.022   3.002        0.993     5139    155.35       6437.20          129.43         1510.60       false
 0.938        2.679   2.669        0.996     5140    160.86       6216.51          150.13         1507.24        true

and then by reducing the reorder MAX_ITERS from 20 to 10 we can regain some more:

latency(ms)  netCPU  avgCpuCount  visited  index(s)  index_docs/s  force_merge(s)  index_size(MB)  bp-reorder
      3.095   3.085        0.997     5145    156.18       6402.87          127.41         1510.59       false
      2.705   2.695        0.996     5133    160.26       6239.82          144.53         1507.26        true

(note: recall is not being shown in this table because it's identical. I made a local change to the script to hide columns that don't vary)

…clude bitsRequired

Michael Sokolov added 3 commits November 17, 2025 11:24

Enable vector ords to be reordered

62d3d8a

Use BpVectorReorderer to reorder when merging

clean up javadocs; remove some nocommits

f2e2fe2

reorder graph and vectors after merge; pass enableReorder down

d84f699

github-actions bot added module:core/index module:core/search module:core/codecs module:misc module:core/hnsw labels Nov 17, 2025

Michael Sokolov added 3 commits November 19, 2025 06:59

fix vector stride, add constructor for enableReorder, add test for re…

d57de24

…ordering in codec

use existing graph renumbering method

35d3ee3

tidy

32a3490

msokolov mentioned this pull request Nov 24, 2025

BaseLucene80DocValuesFormatTestCase.testNumericFieldJumpTables is way too slow #15444

Closed

write DirectWriter.bitsRequired to index

dc02471

Michael Sokolov added 4 commits December 2, 2025 15:09

use merge executor for concurrent bp-reordering

0a2a9c0

expose BpVectorReorderer.DocMap and avoid copying its arrays

90a1618

Add TestBpVectorReorderer.testReorderCodecSparse

bb72107

fixing tests that got disabled; updating dense reordered format to in…

06eed75

…clude bitsRequired

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BP Reordering Codec #15430

BP Reordering Codec #15430

msokolov commented Nov 17, 2025

Uh oh!

msokolov commented Dec 1, 2025

Uh oh!

msokolov commented Dec 1, 2025 •

edited

Loading

Uh oh!

dweiss commented Dec 1, 2025

Uh oh!

benwtrent commented Dec 1, 2025

Uh oh!

msokolov commented Dec 2, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BP Reordering Codec #15430

Are you sure you want to change the base?

BP Reordering Codec #15430

Conversation

msokolov commented Nov 17, 2025

Uh oh!

msokolov commented Dec 1, 2025

Uh oh!

msokolov commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dweiss commented Dec 1, 2025

Uh oh!

benwtrent commented Dec 1, 2025

Uh oh!

msokolov commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

msokolov commented Dec 1, 2025 •

edited

Loading

msokolov commented Dec 2, 2025 •

edited

Loading