Migrate create_index_for_model from records plugin#438
Merged
jhamon merged 6 commits intorelease-candidate/2025-01from Jan 29, 2025
Merged
Migrate create_index_for_model from records plugin#438jhamon merged 6 commits intorelease-candidate/2025-01from
create_index_for_model from records plugin#438jhamon merged 6 commits intorelease-candidate/2025-01from
Conversation
austin-denoble
approved these changes
Jan 29, 2025
Contributor
austin-denoble
left a comment
There was a problem hiding this comment.
Great work, big thanks again for all the plugin porting. Loving the test expansion, and general UX improvement through enums, interface flexibility, etc. 🎉
| """ | ||
| return self.__dict__ | ||
|
|
||
| def __init__( |
Contributor
There was a problem hiding this comment.
Thanks very much for improving this generally, I don't think I left things in the best state in the plugin itself.
Comment on lines
+16
to
+20
| try: | ||
| from pinecone_plugins.records import __installables__ # type: ignore | ||
|
|
||
| if __installables__ is not None: | ||
| raise DeprecatedPluginError("pinecone-plugin-records") |
Contributor
There was a problem hiding this comment.
Really like your solution for this. 👍
jhamon
added a commit
that referenced
this pull request
Jan 30, 2025
## Problem Migrating `search_records` (aliased to `search`) and `upsert_records` from the `pinecone-plugin-records` plugin. ## Solution Working off the content of the records plugin, I have done the following: - Adjusted the codegen script to fix the way openapi generator handles underscore fields such as `_id` and `_score` - Adjusted the rest library code in `rest_urllib3.py` and `rest_aiohttp.py` to handle record uploading with content-type `application/x-ndjson` - Copied and modified the integration tests from the plugin - Extracted a lot of the guts of the `upload_records` and `search_records` methods into the request factory where they could more easily be unit tested. The logic around parsing user inputs into the openapi request objects is surprisingly complicated, so I added quite a lot of new unit tests checking some of those edge cases. - Compared to the plugin implementation, the major changes are: - Made `search` an alias of `search_records` - Moved away from usages of `.pop()` which mutates the input objects; this could be confusing for users if they are using those objects for anything else - Added better typing of dict fields - Incorporated optional use of enum values for `RerankModel` - Added asyncio variants of these methods, although most of the guts are shared in the request factory. I already handled disallowing the records plugin in yesterday's PR #438 ## Usage ```python from pinecone import Pinecone, CloudProvider, AwsRegion, EmbedModel, RerankModel pc = Pinecone(api_key="key") # Create an index for your embedding model index_model = pc.create_index_for_model( name="my-model-index", cloud=CloudProvider.AWS, region=AwsRegion.US_EAST_1, embed={ "model": EmbedModel.Multilingual_E5_Large, "field_map": {"text": "my_text_field"} } ) # Create an index client index = pc.Index(host=index_model.host) # Upsert records namespace = "target-namespace" index.upsert_records( namespace=namespace, records=[ { "_id": "test1", "my_text_field": "Apple is a popular fruit known for its sweetness and crisp texture.", }, { "_id": "test2", "my_text_field": "The tech company Apple is known for its innovative products like the iPhone.", }, { "_id": "test3", "my_text_field": "Many people enjoy eating apples as a healthy snack.", }, { "_id": "test4", "my_text_field": "Apple Inc. has revolutionized the tech industry with its sleek designs and user-friendly interfaces.", }, { "_id": "test5", "my_text_field": "An apple a day keeps the doctor away, as the saying goes.", }, { "_id": "test6", "my_text_field": "Apple Computer Company was founded on April 1, 1976, by Steve Jobs, Steve Wozniak, and Ronald Wayne as a partnership.", }, ], ) # Search for similar records response = index.search( namespace=namespace, query={ "inputs":{ "text": "Apple corporation", }, "top_k":3, }, rerank={ "model": RerankModel.Bge_Reranker_V2_M3, "rank_fields": ["my_text_field"], "top_n": 3, }, ) ``` These methods also have asyncio variants available ```python import asyncio from pinecone import Pinecone, RerankModel async def main(): # Create an index client pc = Pinecone(api_key='key') index = pc.AsyncioIndex(host='host') # Upsert records namespace = "target-namespace" records = [ { "_id": "test1", "my_text_field": "Apple is a popular fruit known for its sweetness and crisp texture.", }, { "_id": "test2", "my_text_field": "The tech company Apple is known for its innovative products like the iPhone.", }, { "_id": "test3", "my_text_field": "Many people enjoy eating apples as a healthy snack.", }, { "_id": "test4", "my_text_field": "Apple Inc. has revolutionized the tech industry with its sleek designs and user-friendly interfaces.", }, { "_id": "test5", "my_text_field": "An apple a day keeps the doctor away, as the saying goes.", }, { "_id": "test6", "my_text_field": "Apple Computer Company was founded on April 1, 1976, by Steve Jobs, Steve Wozniak, and Ronald Wayne as a partnership.", }, ] await index.upsert_records( namespace=namespace, records=records, ) # Search for similar records response = await index.search( namespace=namespace, query={ "inputs":{ "text": "Apple corporation", }, "top_k":3, }, rerank={ "model": RerankModel.Bge_Reranker_V2_M3, "rank_fields": ["my_text_field"], "top_n": 3, }, ) asyncio.run(main()) ``` ## Type of Change - [ ] Bug fix (non-breaking change which fixes an issue) - [x] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] This change requires a documentation update - [ ] Infrastructure change (CI configs, etc) - [ ] Non-code change (docs, etc) - [ ] None of the above: (explain here)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
We need to migrate
create_index_for_modelfunctionality from the records plugin into the core of the SDK to provide improved UX around code completions and error handlingSolution
create_index_for_modelandIndexEmbedfrom records plugincreate_indexmethod to reduce the amount of duplication.Todo
Usage
These would all be considered valid usage. Enums are available to help know what values are accepted, but you can type the literal strings if you prefer. This flexibility also keeps compatibility with existing usage of the plugin.
Type of Change