Speed up grpc fetch and query response parsing#537
Closed
yorickvP wants to merge 1 commit intopinecone-io:mainfrom
Closed
Speed up grpc fetch and query response parsing#537yorickvP wants to merge 1 commit intopinecone-io:mainfrom
yorickvP wants to merge 1 commit intopinecone-io:mainfrom
Conversation
Running a profiler on my pinecone program, it was CPU-bottlenecked on json_format.MessageToDict (being able to do about 100 vectors per second in query and fetch responses). It turns out converting the embeddings to a dict this way is very slow. It's much faster to convert them to a list without going through MessageToDict.
jhamon
added a commit
that referenced
this pull request
Nov 18, 2025
## Problem The current implementation uses `json_format.MessageToDict` to convert entire protobuf messages to dictionaries when parsing gRPC responses. This is a significant CPU bottleneck when processing large numbers of vectors, as reported in PR #537 where users experienced ~100 vectors/second throughput. The `MessageToDict` conversion is expensive because it: 1. Serializes the entire protobuf message to JSON 2. Deserializes it back into a Python dictionary 3. Does this for every field, even when we only need specific fields Additionally, several other performance issues were identified: - Metadata conversion using `MessageToDict` on `Struct` messages - Inefficient list construction (append vs pre-allocation) - Unnecessary dict creation for `SparseValues` parsing - Response header processing overhead ## Solution Optimized all gRPC response parsing functions in `pinecone/grpc/utils.py` to directly access protobuf fields instead of converting entire messages to dictionaries. This approach: 1. **Directly accesses protobuf fields**: Uses `response.vectors`, `response.matches`, `response.namespace`, etc. directly 2. **Optimized metadata conversion**: Created `_struct_to_dict()` helper that directly accesses `Struct` fields (~1.5-2x faster than `MessageToDict`) 3. **Pre-allocates lists**: Uses `[None] * len()` for known-size lists (~6.5% improvement) 4. **Direct SparseValues creation**: Creates `SparseValues` objects directly instead of going through dict conversion (~410x faster) 5. **Caches protobuf attributes**: Stores repeated attribute accesses in local variables 6. **Optimized response info extraction**: Improved `extract_response_info()` performance with module-level constants and early returns 7. **Maintains backward compatibility**: Output format remains identical to the previous implementation ## Performance Impact Performance testing of the response parsing functions show significant improvements across all optimized functions. ## Changes ### Modified Files - `pinecone/grpc/utils.py`: Optimized 9 response parsing functions with direct protobuf field access - Added `_struct_to_dict()` helper for optimized metadata conversion (~1.5-2x faster) - Pre-allocated lists where size is known (~6.5% improvement) - Direct `SparseValues` creation (removed dict conversion overhead) - Cached protobuf message attributes - Removed dead code paths (dict fallback in `parse_usage`) - `pinecone/grpc/index_grpc.py`: Updated to pass protobuf messages directly to parse functions - `pinecone/grpc/resources/vector_grpc.py`: Updated to pass protobuf messages directly to parse functions - `pinecone/utils/response_info.py`: Optimized `extract_response_info()` with module-level constants and early returns - `tests/perf/test_fetch_response_optimization.py`: New performance tests for fetch response parsing - `tests/perf/test_query_response_optimization.py`: New performance tests for query response parsing - `tests/perf/test_other_parse_methods.py`: New performance tests for all other parse methods - `tests/perf/test_grpc_parsing_perf.py`: Extended with additional benchmarks ### Technical Details **Core Optimizations**: 1. **`_struct_to_dict()` Helper Function**: - Directly accesses protobuf `Struct` and `Value` fields - Handles all value types (null, number, string, bool, struct, list) - Recursively processes nested structures - ~1.5-2x faster than `json_format.MessageToDict` for metadata conversion 2. **List Pre-allocation**: - `parse_query_response`: Pre-allocates `matches` list with `[None] * len(matches_proto)` - `parse_list_namespaces_response`: Pre-allocates `namespaces` list - ~6.5% performance improvement over append-based construction 3. **Direct SparseValues Creation**: - Replaced `parse_sparse_values(dict)` with direct `SparseValues(indices=..., values=...)` creation - ~410x faster (avoids dict creation and conversion overhead) ## Testing - All existing unit tests pass (224 tests in `tests/unit_grpc`) - Comprehensive pytest benchmark tests added for all optimized functions: - `test_fetch_response_optimization.py`: Tests for fetch response with varying metadata sizes - `test_query_response_optimization.py`: Tests for query response with varying match counts, dimensions, metadata sizes, and sparse vectors - `test_other_parse_methods.py`: Tests for all other parse methods (fetch_by_metadata, list_namespaces, stats, upsert, update, namespace_description) - Mypy type checking passes with and without grpc extras (with types extras) - No breaking changes - output format remains identical ## Related This addresses the performance issue reported in PR #537, implementing a similar optimization approach but adapted for the current codebase structure. All parse methods have been optimized with comprehensive performance testing to verify improvements.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Running a profiler on my pinecone-using application. it was CPU-bottlenecked on
json_format.MessageToDict(being able to do about 100 vectors per second in query and fetch responses).It turns out converting the embeddings to a dict this way is very slow. It's much faster to convert them to a list without going through
MessageToDict.Solution
Changed
parse_fetch_responseandparse_query_responseto directly read the protobuf structure instead of going throughMessageToDictType of Change
Test Plan
Describe specific steps for validating this change.
make test-grpc-unit.