catchup with main

inayet · Mar 25, 2024 · b3a1b14 · b3a1b14
2 parents bfdf273 + 2dacce4
commit b3a1b14
Show file tree

Hide file tree

Showing 26 changed files with 881 additions and 271 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -16,7 +16,7 @@ Then install the package through poetry:
 Note - You may need to install poetry. See [here](https://python-poetry.org/docs/#installing-with-the-official-installer)
 
 ```bash
-poetry install --with test
+poetry install --with dev
 ```
 
 ## Testing

diff --git a/docs/api/language_model_clients/Mistral.md b/docs/api/language_model_clients/Mistral.md
@@ -0,0 +1,34 @@
+---
+sidebar_position: 9
+---
+
+# dsp.Mistral
+
+### Usage
+
+```python
+lm = dsp.Mistral(model='mistral-medium-latest', api_key="your-mistralai-api-key")
+```
+
+### Constructor
+
+The constructor initializes the base class `LM` and verifies the `api_key` provided or defined through the `MISTRAL_API_KEY` environment variable.
+
+```python
+class Mistral(LM):
+    def __init__(
+        self,
+        model: str = "mistral-medium-latest",
+        api_key: Optional[str] = None,
+        **kwargs,
+    ):
+```
+
+**Parameters:**
+- `model` (_str_): Mistral AI pretrained models. Defaults to `mistral-medium-latest`.
+- `api_key` (_Optional[str]_, _optional_): API provider from Mistral AI. Defaults to None.
+- `**kwargs`: Additional language model arguments to pass to the API provider.
+
+### Methods
+
+Refer to [`dspy.Mistral`](#) documentation.
diff --git a/docs/api/retrieval_model_clients/AzureCognitiveSearch.md b/docs/api/retrieval_model_clients/AzureCognitiveSearch.md
@@ -6,7 +6,7 @@ sidebar_position: 3
 
 ### Constructor
 
-The constructor initializes an instance of the `AzureCognitiveSearch` class and sets up parameters for sending queries and retreiving results  with the Azure Cognitive Search server.
+The constructor initializes an instance of the `AzureCognitiveSearch` class and sets up parameters for sending queries and retreiving results with the Azure Cognitive Search server.
 
 ```python
 class AzureCognitiveSearch:
@@ -21,6 +21,7 @@ class AzureCognitiveSearch:
 ```
 
 **Parameters:**
+
 - `search_service_name` (_str_): Name of Azure Cognitive Search server.
 - `search_api_key` (_str_): API Authentication token for accessing Azure Cognitive Search server.
 - `search_index_name` (_str_): Name of search index in the Azure Cognitive Search server.
@@ -31,4 +32,8 @@ class AzureCognitiveSearch:
 
 Refer to [ColBERTv2](/api/retrieval_model_clients/ColBERTv2) documentation. Keep in mind there is no `simplify` flag for AzureCognitiveSearch.
 
-AzureCognitiveSearch supports sending queries and processing the received results, mapping content and scores to a correct format for the Azure Cognitive Search server.
+AzureCognitiveSearch supports sending queries and processing the received results, mapping content and scores to a correct format for the Azure Cognitive Search server.
+
+### Deprecation Notice
+
+This module is scheduled for removal in future releases. Please use the AzureAISearchRM class from dspy.retrieve.azureaisearch_rm instead.For more information, refer to the updated documentation(docs/docs/deep-dive/retrieval_models_clients/Azure.mdx).
diff --git a/docs/api/retrieval_model_clients/ChromadbRM.md b/docs/api/retrieval_model_clients/ChromadbRM.md
@@ -41,7 +41,7 @@ Search the chromadb collection for the top `k` passages matching the given query
 ChromadbRM have the flexibility from a variety of embedding functions as outlined in the [chromadb embeddings documentation](https://docs.trychroma.com/embeddings). While different options are available, this example demonstrates how to utilize OpenAI embeddings specifically.
 
 ```python
-from dspy.retrieve.chroma_rm import ChromadbRM
+from dspy.retrieve.chromadb_rm import ChromadbRM
 import os
 import openai
 from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
@@ -62,4 +62,4 @@ results = retriever_model("Explore the significance of quantum computing", k=5)
 
 for result in results:
     print("Document:", result.long_text, "\n")
-```
+```
diff --git a/docs/docs/building-blocks/1-language_models.md b/docs/docs/building-blocks/1-language_models.md
@@ -18,10 +18,6 @@ For example, to use OpenAI language models, you can do it as follows.
 gpt3_turbo = dspy.OpenAI(model='gpt-3.5-turbo-1106', max_tokens=300)
 dspy.configure(lm=gpt3_turbo)
 ```
-**Output:**
-```text
-['Hello! How can I assist you today?']
-```
 
 ## Directly calling the LM.
 
@@ -31,11 +27,16 @@ You can simply call the LM with a string to give it a raw prompt, i.e. a string.
 gpt3_turbo("hello! this is a raw prompt to GPT-3.5")
 ```
 
+**Output:**
+```text
+['Hello! How can I assist you today?']
+```
+
 This is almost never the recommended way to interact with LMs in DSPy, but it is allowed.
 
 ## Using the LM with DSPy signatures.
 
-You can also use the LM via DSPy [signatures] and [modules], which we discuss in more depth in the remaining guides.
+You can also use the LM via DSPy [`signature` (input/output spec)](https://dspy-docs.vercel.app/docs/building-blocks/signatures) and [`modules`](https://dspy-docs.vercel.app/docs/building-blocks/modules), which we discuss in more depth in the remaining guides.
 
 ```python
 # Define a module (ChainOfThought) and assign it a signature (return an answer, given a question).
@@ -172,4 +173,4 @@ model = 'dist/prebuilt/mlc-chat-Llama-2-7b-chat-hf-q4f16_1'
 model_path = 'dist/prebuilt/lib/Llama-2-7b-chat-hf-q4f16_1-cuda.so'
 
 llama = dspy.ChatModuleClient(model=model, model_path=model_path)
-```
+```
diff --git a/docs/docs/building-blocks/2-signatures.md b/docs/docs/building-blocks/2-signatures.md
@@ -6,7 +6,7 @@ sidebar_position: 2
 
 When we assign tasks to LMs in DSPy, we specify the behavior we need as a Signature.
 
-**A signature is a declarative specification of input/output behavior of a DSPy module.** Signatures allow you tell the LM _what_ it needs to do, rather than specify _how_ we should ask the LM to do it.
+**A signature is a declarative specification of input/output behavior of a DSPy module.** Signatures allow you to tell the LM _what_ it needs to do, rather than specify _how_ we should ask the LM to do it.
 
 
 You're probably familiar with function signatures, which specify the input and output arguments and their types. DSPy signatures are similar, but the differences are that:

diff --git a/docs/docs/building-blocks/4-data.md b/docs/docs/building-blocks/4-data.md
@@ -78,7 +78,7 @@ input_key_only = article_summary.inputs()
 non_input_key_only = article_summary.labels()
 
 print("Example object with Input fields only:", input_key_only)
-print("Example object with Non-Input fields only:", non_input_key_only))
+print("Example object with Non-Input fields only:", non_input_key_only)
 ```
 
 **Output**

diff --git a/docs/docs/building-blocks/solving_your_task.md b/docs/docs/building-blocks/solving_your_task.md
@@ -8,7 +8,7 @@ Using DSPy well for solving a new task is just doing good machine learning with
 
 What this means is that it's an iterative process. You make some initial choices, which will be sub-optimal, and then you refine them incrementally. 
 
-As we discuss below, you will define your task and the metrics you want to maximize, and prepare a few example inputs — typically without labels (or only with labels for the final outputs, if your metric requires them). Then, you build your pipeline by selecting built-in layers [(`modules`)](https://dspy-docs.vercel.app/docs/building-blocks/modules) to use, giving each layer a [`signature` (input/output spec)](https://dspy-docs.vercel.app/docs/building-blocks/signatures), and then calling your modules freely in your Python code. Lastly, you use a DSPy [`optimizer`]https://dspy-docs.vercel.app/docs/building-blocks/optimizers) to compile your code into high-quality instructions, automatic few-shot examples, or updated LM weights for your LM.
+As we discuss below, you will define your task and the metrics you want to maximize, and prepare a few example inputs — typically without labels (or only with labels for the final outputs, if your metric requires them). Then, you build your pipeline by selecting built-in layers [(`modules`)](https://dspy-docs.vercel.app/docs/building-blocks/modules) to use, giving each layer a [`signature` (input/output spec)](https://dspy-docs.vercel.app/docs/building-blocks/signatures), and then calling your modules freely in your Python code. Lastly, you use a DSPy [`optimizer`](https://dspy-docs.vercel.app/docs/building-blocks/optimizers) to compile your code into high-quality instructions, automatic few-shot examples, or updated LM weights for your LM.
 
 
 ## 1) Define your task.

diff --git a/docs/docs/deep-dive/data-handling/examples.mdx b/docs/docs/deep-dive/data-handling/examples.mdx
@@ -68,7 +68,7 @@ input_key_only = article_summary.inputs()
 non_input_key_only = article_summary.labels()
 
 print("Example object with Input fields only:", input_key_only)
-print("Example object with Non-Input fields only:", non_input_key_only))
+print("Example object with Non-Input fields only:", non_input_key_only)
 ```
 
 **Output**

diff --git a/docs/docs/deep-dive/retrieval_models_clients/Azure.mdx b/docs/docs/deep-dive/retrieval_models_clients/Azure.mdx
@@ -4,80 +4,130 @@ sidebar_position: 2
 
 import AuthorDetails from '@site/src/components/AuthorDetails';
 
-# AzureCognitiveSearch
+# AzureAISearch
+
+A retrieval module that utilizes Azure AI Search to retrieve top passages for a given query.
 
 ## Prerequisites
 
 ```bash
-pip install azure-core
+pip install azure-search-documents
 ```
 
-## Setting up the Azure Client
-
-The constructor initializes an instance of the `AzureCognitiveSearch` class and sets up parameters for sending queries and retrieving results with the Azure Cognitive Search server.
-
-- `search_service_name` (_str_): Name of Azure Cognitive Search server.
-- `search_api_key` (_str_): API Authentication token for accessing Azure Cognitive Search server.
-- `search_index_name` (_str_): Name of search index in the Azure Cognitive Search server.
-- `field_text` (_str_): Field name that maps to DSP "content" field.
-- `field_score` (_str_): Field name that maps to DSP "score" field.
+## Setting up the AzureAISearchRM Client
+
+The constructor initializes an instance of the `AzureAISearchRM` class and sets up parameters for sending queries and retrieving results with the Azure AI Search server.
+
+- `search_service_name` (str): The name of the Azure AI Search service.
+- `search_api_key` (str): The API key for accessing the Azure AI Search service.
+- `search_index_name` (str): The name of the search index in the Azure AI Search service.
+- `field_text` (str): The name of the field containing text content in the search index. This field will be mapped to the "content" field in the dsp framework.
+- `k` (int, optional): The default number of top passages to retrieve. Defaults to 3.
+- `semantic_ranker` (bool, optional): Whether to use semantic ranking. Defaults to False.
+- `filter` (str, optional): Additional filter query. Defaults to None.
+- `query_language` (str, optional): The language of the query. Defaults to "en-Us".
+- `query_speller` (str, optional): The speller mode. Defaults to "lexicon".
+- `use_semantic_captions` (bool, optional): Whether to use semantic captions. Defaults to False.
+- `query_type` (Optional[QueryType], optional): The type of query. Defaults to QueryType.FULL.
+- `semantic_configuration_name` (str, optional): The name of the semantic configuration. Defaults to None.
+
+Available Query Types:
+
+    SIMPLE
+    """Uses the simple query syntax for searches. Search text is interpreted using a simple query
+    #: language that allows for symbols such as +, * and "". Queries are evaluated across all
+    #: searchable fields by default, unless the searchFields parameter is specified."""
+    FULL
+    """Uses the full Lucene query syntax for searches. Search text is interpreted using the Lucene
+    #: query language which allows field-specific and weighted searches, as well as other advanced
+    #: features."""
+    SEMANTIC
+    """Best suited for queries expressed in natural language as opposed to keywords. Improves
+    #: precision of search results by re-ranking the top search results using a ranking model trained
+    #: on the Web corpus.""
+
+    More Details: https://learn.microsoft.com/en-us/azure/search/search-query-overview
+
+Example of the AzureAISearchRM constructor:
 
 ```python
-class AzureCognitiveSearch:
-    def __init__(
-        self,
-        search_service_name: str,
-        search_api_key: str,
-        search_index_name: str,
-        field_text: str,
-        field_score: str, # required field to map with "score" field in dsp framework
-    ):
+AzureAISearchRM(
+    search_service_name: str,
+    search_api_key: str,
+    search_index_name: str,
+    field_text: str,
+    k: int = 3,
+    semantic_ranker: bool = False,
+    filter: str = None,
+    query_language: str = "en-Us",
+    query_speller: str = "lexicon",
+    use_semantic_captions: bool = False,
+    query_type: Optional[QueryType] = QueryType.FULL,
+    semantic_configuration_name: str = None
+)
 ```
 
 ## Under the Hood
 
-### `__call__(self, query: str, k: int = 10) -> Union[list[str], list[dotdict]]:`
+### `forward(self, query_or_queries: Union[str, List[str]], k: Optional[int] = None) -> dspy.Prediction`
 
 **Parameters:**
-- `query` (_str_): Search query string used for retrieval sent to Azure Cognitive Search service.
-- `k` (_int_, _optional_): Number of passages to retrieve. Defaults to 10.
+
+- `query_or_queries` (Union[str, List[str]]): The query or queries to search for.
+- `k` (_Optional[int]_, _optional_): The number of results to retrieve. If not specified, defaults to the value set during initialization.
 
 **Returns:**
-- `Union[list[str], list[dotdict]]`: list of top-k search results 
 
-Internally, the method handles the specifics of preparing the request query to the Azure Cognitive Search service and corresponding payload to obtain the response. 
+- `dspy.Prediction`: Contains the retrieved passages, each represented as a `dotdict` with a `long_text` attribute.
 
-The method sends a query and number of desired passages (k) to Azure Cognitive Search using `azure_search_request`. This function communicates with Azure and processes the search results as a list of dictionaries. 
+Internally, the method handles the specifics of preparing the request query to the Azure AI Search service and corresponding payload to obtain the response.
 
-This is then converted to `dotdict` objects that internally map the retrieved content and scores, listed by descending order of relevance.
+The function handles the retrieval of the top-k passages based on the provided query.
 
-## Sending Retrieval Requests via Azure Client
-1) _**Recommended**_ Configure default RM using `dspy.configure`.
+## Sending Retrieval Requests via AzureAISearchRM Client
+
+1. _**Recommended**_ Configure default RM using `dspy.configure`.
 
 This allows you to define programs in DSPy and have DSPy internally conduct retrieval using `dsp.retrieve` on the query on the configured RM.
 
 ```python
 import dspy
-import dsp
+from dspy.retrieve.azureaisearch_rm import AzureAISearchRM
+
+azure_search = AzureAISearchRM(
+    "search_service_name",
+    "search_api_key",
+    "search_index_name",
+    "field_text",
+    "k"=3
+)
 
-dspy.settings.configure(rm= TODO)
-retrieval_response = dsp.retrieve("When was the first FIFA World Cup held?", k=5)
+dspy.settings.configure(rm=azure_search)
+retrieve = dspy.Retrieve(k=3)
+retrieval_response = retrieve("What is Thermodynamics").passages
 
 for result in retrieval_response:
     print("Text:", result, "\n")
 ```
 
+2. Generate responses using the client directly.
 
-2) Generate responses using the client directly.
 ```python
-import dspy
+from dspy.retrieve.azureaisearch_rm import AzureAISearchRM
 
-retrieval_response = TODO('When was the first FIFA World Cup held?', k=5)
+azure_search = AzureAISearchRM(
+    "search_service_name",
+    "search_api_key",
+    "search_index_name",
+    "field_text",
+    "k"=3
+)
 
+retrieval_response = azure_search("What is Thermodynamics", k=3)
 for result in retrieval_response:
-    print("Text:", result['text'], "\n")
+    print("Text:", result.long_text, "\n")
 ```
 
 ***
 
-<AuthorDetails name="Arnav Singhvi"/>
+<AuthorDetails name="Prajapati Harishkumar Kishorkumar"/>
diff --git a/docs/docs/quick-start/installation.mdx b/docs/docs/quick-start/installation.mdx
@@ -53,6 +53,11 @@ import TabItem from '@theme/TabItem';
         pip install dspy-ai[mongodb]
         ```
     </TabItem>
+    <TabItem value="weaviate" label="Weaviate">
+        ```text
+        pip install dspy-ai[weaviate]
+        ```
+    </TabItem>
 
 </Tabs>
 

diff --git a/dsp/modules/__init__.py b/dsp/modules/__init__.py
@@ -10,6 +10,7 @@
 from .gpt3 import *
 from .hf import HFModel
 from .hf_client import Anyscale, HFClientTGI, Together
+from .mistral import *
 from .ollama import *
 from .pyserini import *
 from .sbert import *

diff --git a/dsp/modules/azurecognitivesearch.py b/dsp/modules/azurecognitivesearch.py
@@ -12,6 +12,10 @@
         "Please use the command: pip install azure-search-documents",
     )
 
+# Deprecated: This module is scheduled for removal in future releases.
+# Please use the AzureAISearchRM class from dspy.retrieve.azureaisearch_rm instead.
+# For more information, refer to the updated documentation.
+
 class AzureCognitiveSearch:
     """Wrapper for the Azure Cognitive Search Retrieval."""
 
@@ -37,9 +41,12 @@ def __init__(
                         credential=self.credential)
 
     def __call__(self, query: str, k: int = 10) -> Union[list[str], list[dotdict]]:
-
+        print("""# Deprecated: This module is scheduled for removal in future releases.
+                Please use the AzureAISearchRM class from dspy.retrieve.azureaisearch_rm instead.
+                For more information, refer to the updated documentation.""")
+
         topk: list[dict[str, Any]] = azure_search_request(self.field_text, self.field_score, self.client, query, k)
-        topk = [{**d, "long_text": d["text"]} for d in topk]            
+        topk = [{**d, "long_text": d["text"]} for d in topk]
 
         return [dotdict(psg) for psg in topk]
 
@@ -65,6 +72,6 @@ def process_azure_result(results:SearchItemPaged, content_key:str, content_score
             elif(key == content_score):
                 tmp["score"] = value
             else:
-                tmp[key] = value            
+                tmp[key] = value
         res.append(tmp)
-    return res 
+    return res