Skip to content

fix(python): dispatch vindex infer through inference weights#174

Open
citizenu03bb wants to merge 1 commit into
chrishayuk:mainfrom
citizenu03bb:pr/python-kquant-infer-dispatch
Open

fix(python): dispatch vindex infer through inference weights#174
citizenu03bb wants to merge 1 commit into
chrishayuk:mainfrom
citizenu03bb:pr/python-kquant-infer-dispatch

Conversation

@citizenu03bb

Copy link
Copy Markdown

Title:
fix(python): dispatch vindex infer through inference weights

Summary:

  • Make Python Vindex lazy-load InferenceWeights instead of dense-only mmap weights for infer.
  • Route infer and infer_trace through InferenceWeights::infer_patched.
  • Keep dense-only analysis helpers using InferenceWeights::as_weights().

Why:
The Python Vindex.infer path used the dense mmap loader directly. That loader
does not understand Q4K/kquant vindex weight files, so quantized indexes could
load through the wrong path and produce bad predictions/traces. The shared
InferenceWeights loader already detects the format and dispatches to dense or
Q4K inference as appropriate.

Verification:

  • cargo check -p larql-python
  • cargo test -p larql-python --lib

Branch:
pr/python-kquant-infer-dispatch

Commit:
e4fe4fd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant