Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some basic HNSW graph checks to CheckIndex #13984

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
101 changes: 101 additions & 0 deletions lucene/core/src/java/org/apache/lucene/index/CheckIndex.java
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@
import org.apache.lucene.codecs.StoredFieldsReader;
import org.apache.lucene.codecs.TermVectorsReader;
import org.apache.lucene.codecs.hnsw.FlatVectorsReader;
import org.apache.lucene.codecs.hnsw.HnswGraphProvider;
import org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.DocumentStoredFieldVisitor;
Expand Down Expand Up @@ -91,6 +92,7 @@
import org.apache.lucene.util.automaton.ByteRunAutomaton;
import org.apache.lucene.util.automaton.CompiledAutomaton;
import org.apache.lucene.util.automaton.Operations;
import org.apache.lucene.util.hnsw.HnswGraph;

/**
* Basic tool and API to check the health of an index and write a new segments file that removes
Expand Down Expand Up @@ -249,6 +251,9 @@ public static class SegmentInfoStatus {
/** Status of vectors */
public VectorValuesStatus vectorValuesStatus;

/** Status of HNSW graph */
public HnswGraphStatus hnswGraphStatus;

/** Status of soft deletes */
public SoftDeletesStatus softDeletesStatus;

Expand Down Expand Up @@ -406,6 +411,21 @@ public static final class VectorValuesStatus {
public Throwable error;
}

/** Status from testing HNSW graph */
public static final class HnswGraphStatus {

HnswGraphStatus() {}

/** Number of nodes in the graph */
public int hnswGraphSize;
benchaplin marked this conversation as resolved.
Show resolved Hide resolved

/** Number of levels of the graph */
public int hsnwGraphNumLevels;
benchaplin marked this conversation as resolved.
Show resolved Hide resolved

/** Exception thrown during vector values test (null on success) */
public Throwable error;
}

/** Status from testing index sort */
public static final class IndexSortStatus {
IndexSortStatus() {}
Expand Down Expand Up @@ -1085,6 +1105,9 @@ private Status.SegmentInfoStatus testSegment(
// Test FloatVectorValues and ByteVectorValues
segInfoStat.vectorValuesStatus = testVectors(reader, infoStream, failFast);

// Test HNSW graph
segInfoStat.hnswGraphStatus = testHnswGraph(reader, infoStream, failFast);

// Test Index Sort
if (indexSort != null) {
segInfoStat.indexSortStatus = testSort(reader, indexSort, infoStream, failFast);
Expand Down Expand Up @@ -2746,6 +2769,84 @@ public static Status.VectorValuesStatus testVectors(
return status;
}

private static HnswGraph getHnswGraph(CodecReader reader) throws IOException {
KnnVectorsReader vectorsReader = reader.getVectorReader();
if (vectorsReader instanceof PerFieldKnnVectorsFormat.FieldsReader) {
vectorsReader = ((PerFieldKnnVectorsFormat.FieldsReader) vectorsReader).getFieldReader("knn");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm the field can be anything (not necessarily knn like luceneutil uses)? Can we change this so we iterate FieldInfos and for any field that has KNN vectors enabled (FieldInfo.hasVectorValues, hmm but also has the HNSW graph (since I think a field can be only "flat" vectors?)) we do this check?

Copy link
Contributor Author

@benchaplin benchaplin Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I didn't quite understand fields when I wrote this - I think I get it now. Alright, I've done what you suggested (as is also done in testVectors) and iterated over FieldInfos, performing the check only when it applies.

Because we might now parse several HNSW graphs, I've restructured the status object to support per-graph data. Successful output will now look like:

    test: open reader.........OK [took 0.010 sec]
    test: check integrity.....OK [took 2.216 sec]
    test: check live docs.....OK [took 0.000 sec]
    test: field infos.........OK [2 fields] [took 0.000 sec]
    test: field norms.........OK [0 fields] [took 0.000 sec]
    test: terms, freq, prox...    test: stored fields.......OK [1500000 total field count; avg 1.0 fields per doc] [took 0.390 sec]
    test: term vectors........OK [0 total term vector count; avg 0.0 term/freq vector fields per doc] [took 0.000 sec]
    test: docvalues...........OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 0 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET; 0 SKIPPING INDEX] [took 0.000 sec]
    test: points..............OK [0 fields, 0 points] [took 0.000 sec]
    test: vectors.............OK [1 fields, 1500000 vectors] [took 0.496 sec]
    test: hnsw graphs.........OK [2 fields: (field name: knn1, levels: 4, total nodes: 1547684), (field name: knn2, levels: 4, total nodes: 1547684)] [took 0.979 sec]

testVectors doesn't do this, it just sums vectors over all fields. I could do that too, but this felt most complete.

if (vectorsReader instanceof HnswGraphProvider) {
return ((HnswGraphProvider) vectorsReader).getGraph("knn");
}
}
return null;
}

/** Test the HNSW graph. */
public static Status.HnswGraphStatus testHnswGraph(
CodecReader reader, PrintStream infoStream, boolean failFast) throws IOException {
if (infoStream != null) {
infoStream.print(" test: hnsw graph..........");
}
long startNS = System.nanoTime();
Status.HnswGraphStatus status = new Status.HnswGraphStatus();

try {
HnswGraph hnswGraph = getHnswGraph(reader);
if (hnswGraph != null) {
final int numLevels = hnswGraph.numLevels();
// Perform tests on each level of the HNSW graph
for (int level = numLevels - 1; level >= 0; level--) {
HnswGraph.NodesIterator nodesIterator = hnswGraph.getNodesOnLevel(level);
while (nodesIterator.hasNext()) {
int node = nodesIterator.nextInt();
hnswGraph.seek(level, node);
int nbr, lastNeighbor = -1, firstNeighbor = -1;
while ((nbr = hnswGraph.nextNeighbor()) != NO_MORE_DOCS) {
if (firstNeighbor == -1) {
firstNeighbor = nbr;
}
if (nbr < lastNeighbor) {
throw new CheckIndexException(
"Neighbors out of order for node "
+ node
+ ": "
+ nbr
+ "<"
+ lastNeighbor
+ " 1st="
+ firstNeighbor);
} else if (nbr == lastNeighbor) {
throw new CheckIndexException(
"There are repeated neighbors of node " + node + " with value " + nbr);
}
lastNeighbor = nbr;
}
status.hnswGraphSize++;
}
status.hsnwGraphNumLevels++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of incrementing here, you could just set it to numLevels from above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could, but wanted to respect the object as a "status" - so if a check fails, numLevels will indicate on which level it failed. However we don't print the status if an exception is thrown. What do you think?

}
msg(
infoStream,
String.format(
Locale.ROOT,
"OK [%d levels, %d nodes (over all levels)] [took %.3f sec]",
status.hsnwGraphNumLevels,
status.hnswGraphSize,
nsToSec(System.nanoTime() - startNS)));
}
} catch (Throwable e) {
if (failFast) {
throw IOUtils.rethrowAlways(e);
}
msg(infoStream, "ERROR: " + e);
status.error = e;
if (infoStream != null) {
e.printStackTrace(infoStream);
}
}

return status;
}

private static boolean vectorsReaderSupportsSearch(CodecReader codecReader, String fieldName) {
KnnVectorsReader vectorsReader = codecReader.getVectorReader();
if (vectorsReader instanceof PerFieldKnnVectorsFormat.FieldsReader perFieldReader) {
Expand Down