Skip to content

Commit

Permalink
catchup with main
Browse files Browse the repository at this point in the history
  • Loading branch information
KCaverly committed Mar 25, 2024
2 parents bfdf273 + 2dacce4 commit b3a1b14
Show file tree
Hide file tree
Showing 26 changed files with 881 additions and 271 deletions.
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Then install the package through poetry:
Note - You may need to install poetry. See [here](https://python-poetry.org/docs/#installing-with-the-official-installer)

```bash
poetry install --with test
poetry install --with dev
```

## Testing
Expand Down
34 changes: 34 additions & 0 deletions docs/api/language_model_clients/Mistral.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
sidebar_position: 9
---

# dsp.Mistral

### Usage

```python
lm = dsp.Mistral(model='mistral-medium-latest', api_key="your-mistralai-api-key")
```

### Constructor

The constructor initializes the base class `LM` and verifies the `api_key` provided or defined through the `MISTRAL_API_KEY` environment variable.

```python
class Mistral(LM):
def __init__(
self,
model: str = "mistral-medium-latest",
api_key: Optional[str] = None,
**kwargs,
):
```

**Parameters:**
- `model` (_str_): Mistral AI pretrained models. Defaults to `mistral-medium-latest`.
- `api_key` (_Optional[str]_, _optional_): API provider from Mistral AI. Defaults to None.
- `**kwargs`: Additional language model arguments to pass to the API provider.

### Methods

Refer to [`dspy.Mistral`](#) documentation.
9 changes: 7 additions & 2 deletions docs/api/retrieval_model_clients/AzureCognitiveSearch.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ sidebar_position: 3

### Constructor

The constructor initializes an instance of the `AzureCognitiveSearch` class and sets up parameters for sending queries and retreiving results with the Azure Cognitive Search server.
The constructor initializes an instance of the `AzureCognitiveSearch` class and sets up parameters for sending queries and retreiving results with the Azure Cognitive Search server.

```python
class AzureCognitiveSearch:
Expand All @@ -21,6 +21,7 @@ class AzureCognitiveSearch:
```

**Parameters:**

- `search_service_name` (_str_): Name of Azure Cognitive Search server.
- `search_api_key` (_str_): API Authentication token for accessing Azure Cognitive Search server.
- `search_index_name` (_str_): Name of search index in the Azure Cognitive Search server.
Expand All @@ -31,4 +32,8 @@ class AzureCognitiveSearch:

Refer to [ColBERTv2](/api/retrieval_model_clients/ColBERTv2) documentation. Keep in mind there is no `simplify` flag for AzureCognitiveSearch.

AzureCognitiveSearch supports sending queries and processing the received results, mapping content and scores to a correct format for the Azure Cognitive Search server.
AzureCognitiveSearch supports sending queries and processing the received results, mapping content and scores to a correct format for the Azure Cognitive Search server.

### Deprecation Notice

This module is scheduled for removal in future releases. Please use the AzureAISearchRM class from dspy.retrieve.azureaisearch_rm instead.For more information, refer to the updated documentation(docs/docs/deep-dive/retrieval_models_clients/Azure.mdx).
4 changes: 2 additions & 2 deletions docs/api/retrieval_model_clients/ChromadbRM.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Search the chromadb collection for the top `k` passages matching the given query
ChromadbRM have the flexibility from a variety of embedding functions as outlined in the [chromadb embeddings documentation](https://docs.trychroma.com/embeddings). While different options are available, this example demonstrates how to utilize OpenAI embeddings specifically.

```python
from dspy.retrieve.chroma_rm import ChromadbRM
from dspy.retrieve.chromadb_rm import ChromadbRM
import os
import openai
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
Expand All @@ -62,4 +62,4 @@ results = retriever_model("Explore the significance of quantum computing", k=5)

for result in results:
print("Document:", result.long_text, "\n")
```
```
13 changes: 7 additions & 6 deletions docs/docs/building-blocks/1-language_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,6 @@ For example, to use OpenAI language models, you can do it as follows.
gpt3_turbo = dspy.OpenAI(model='gpt-3.5-turbo-1106', max_tokens=300)
dspy.configure(lm=gpt3_turbo)
```
**Output:**
```text
['Hello! How can I assist you today?']
```

## Directly calling the LM.

Expand All @@ -31,11 +27,16 @@ You can simply call the LM with a string to give it a raw prompt, i.e. a string.
gpt3_turbo("hello! this is a raw prompt to GPT-3.5")
```

**Output:**
```text
['Hello! How can I assist you today?']
```

This is almost never the recommended way to interact with LMs in DSPy, but it is allowed.

## Using the LM with DSPy signatures.

You can also use the LM via DSPy [signatures] and [modules], which we discuss in more depth in the remaining guides.
You can also use the LM via DSPy [`signature` (input/output spec)](https://dspy-docs.vercel.app/docs/building-blocks/signatures) and [`modules`](https://dspy-docs.vercel.app/docs/building-blocks/modules), which we discuss in more depth in the remaining guides.

```python
# Define a module (ChainOfThought) and assign it a signature (return an answer, given a question).
Expand Down Expand Up @@ -172,4 +173,4 @@ model = 'dist/prebuilt/mlc-chat-Llama-2-7b-chat-hf-q4f16_1'
model_path = 'dist/prebuilt/lib/Llama-2-7b-chat-hf-q4f16_1-cuda.so'

llama = dspy.ChatModuleClient(model=model, model_path=model_path)
```
```
2 changes: 1 addition & 1 deletion docs/docs/building-blocks/2-signatures.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ sidebar_position: 2

When we assign tasks to LMs in DSPy, we specify the behavior we need as a Signature.

**A signature is a declarative specification of input/output behavior of a DSPy module.** Signatures allow you tell the LM _what_ it needs to do, rather than specify _how_ we should ask the LM to do it.
**A signature is a declarative specification of input/output behavior of a DSPy module.** Signatures allow you to tell the LM _what_ it needs to do, rather than specify _how_ we should ask the LM to do it.


You're probably familiar with function signatures, which specify the input and output arguments and their types. DSPy signatures are similar, but the differences are that:
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/building-blocks/4-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ input_key_only = article_summary.inputs()
non_input_key_only = article_summary.labels()

print("Example object with Input fields only:", input_key_only)
print("Example object with Non-Input fields only:", non_input_key_only))
print("Example object with Non-Input fields only:", non_input_key_only)
```

**Output**
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/building-blocks/solving_your_task.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Using DSPy well for solving a new task is just doing good machine learning with

What this means is that it's an iterative process. You make some initial choices, which will be sub-optimal, and then you refine them incrementally.

As we discuss below, you will define your task and the metrics you want to maximize, and prepare a few example inputs — typically without labels (or only with labels for the final outputs, if your metric requires them). Then, you build your pipeline by selecting built-in layers [(`modules`)](https://dspy-docs.vercel.app/docs/building-blocks/modules) to use, giving each layer a [`signature` (input/output spec)](https://dspy-docs.vercel.app/docs/building-blocks/signatures), and then calling your modules freely in your Python code. Lastly, you use a DSPy [`optimizer`]https://dspy-docs.vercel.app/docs/building-blocks/optimizers) to compile your code into high-quality instructions, automatic few-shot examples, or updated LM weights for your LM.
As we discuss below, you will define your task and the metrics you want to maximize, and prepare a few example inputs — typically without labels (or only with labels for the final outputs, if your metric requires them). Then, you build your pipeline by selecting built-in layers [(`modules`)](https://dspy-docs.vercel.app/docs/building-blocks/modules) to use, giving each layer a [`signature` (input/output spec)](https://dspy-docs.vercel.app/docs/building-blocks/signatures), and then calling your modules freely in your Python code. Lastly, you use a DSPy [`optimizer`](https://dspy-docs.vercel.app/docs/building-blocks/optimizers) to compile your code into high-quality instructions, automatic few-shot examples, or updated LM weights for your LM.


## 1) Define your task.
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/deep-dive/data-handling/examples.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ input_key_only = article_summary.inputs()
non_input_key_only = article_summary.labels()

print("Example object with Input fields only:", input_key_only)
print("Example object with Non-Input fields only:", non_input_key_only))
print("Example object with Non-Input fields only:", non_input_key_only)
```

**Output**
Expand Down
124 changes: 87 additions & 37 deletions docs/docs/deep-dive/retrieval_models_clients/Azure.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,80 +4,130 @@ sidebar_position: 2

import AuthorDetails from '@site/src/components/AuthorDetails';

# AzureCognitiveSearch
# AzureAISearch

A retrieval module that utilizes Azure AI Search to retrieve top passages for a given query.

## Prerequisites

```bash
pip install azure-core
pip install azure-search-documents
```

## Setting up the Azure Client

The constructor initializes an instance of the `AzureCognitiveSearch` class and sets up parameters for sending queries and retrieving results with the Azure Cognitive Search server.

- `search_service_name` (_str_): Name of Azure Cognitive Search server.
- `search_api_key` (_str_): API Authentication token for accessing Azure Cognitive Search server.
- `search_index_name` (_str_): Name of search index in the Azure Cognitive Search server.
- `field_text` (_str_): Field name that maps to DSP "content" field.
- `field_score` (_str_): Field name that maps to DSP "score" field.
## Setting up the AzureAISearchRM Client

The constructor initializes an instance of the `AzureAISearchRM` class and sets up parameters for sending queries and retrieving results with the Azure AI Search server.

- `search_service_name` (str): The name of the Azure AI Search service.
- `search_api_key` (str): The API key for accessing the Azure AI Search service.
- `search_index_name` (str): The name of the search index in the Azure AI Search service.
- `field_text` (str): The name of the field containing text content in the search index. This field will be mapped to the "content" field in the dsp framework.
- `k` (int, optional): The default number of top passages to retrieve. Defaults to 3.
- `semantic_ranker` (bool, optional): Whether to use semantic ranking. Defaults to False.
- `filter` (str, optional): Additional filter query. Defaults to None.
- `query_language` (str, optional): The language of the query. Defaults to "en-Us".
- `query_speller` (str, optional): The speller mode. Defaults to "lexicon".
- `use_semantic_captions` (bool, optional): Whether to use semantic captions. Defaults to False.
- `query_type` (Optional[QueryType], optional): The type of query. Defaults to QueryType.FULL.
- `semantic_configuration_name` (str, optional): The name of the semantic configuration. Defaults to None.

Available Query Types:

SIMPLE
"""Uses the simple query syntax for searches. Search text is interpreted using a simple query
#: language that allows for symbols such as +, * and "". Queries are evaluated across all
#: searchable fields by default, unless the searchFields parameter is specified."""
FULL
"""Uses the full Lucene query syntax for searches. Search text is interpreted using the Lucene
#: query language which allows field-specific and weighted searches, as well as other advanced
#: features."""
SEMANTIC
"""Best suited for queries expressed in natural language as opposed to keywords. Improves
#: precision of search results by re-ranking the top search results using a ranking model trained
#: on the Web corpus.""

More Details: https://learn.microsoft.com/en-us/azure/search/search-query-overview

Example of the AzureAISearchRM constructor:

```python
class AzureCognitiveSearch:
def __init__(
self,
search_service_name: str,
search_api_key: str,
search_index_name: str,
field_text: str,
field_score: str, # required field to map with "score" field in dsp framework
):
AzureAISearchRM(
search_service_name: str,
search_api_key: str,
search_index_name: str,
field_text: str,
k: int = 3,
semantic_ranker: bool = False,
filter: str = None,
query_language: str = "en-Us",
query_speller: str = "lexicon",
use_semantic_captions: bool = False,
query_type: Optional[QueryType] = QueryType.FULL,
semantic_configuration_name: str = None
)
```

## Under the Hood

### `__call__(self, query: str, k: int = 10) -> Union[list[str], list[dotdict]]:`
### `forward(self, query_or_queries: Union[str, List[str]], k: Optional[int] = None) -> dspy.Prediction`

**Parameters:**
- `query` (_str_): Search query string used for retrieval sent to Azure Cognitive Search service.
- `k` (_int_, _optional_): Number of passages to retrieve. Defaults to 10.

- `query_or_queries` (Union[str, List[str]]): The query or queries to search for.
- `k` (_Optional[int]_, _optional_): The number of results to retrieve. If not specified, defaults to the value set during initialization.

**Returns:**
- `Union[list[str], list[dotdict]]`: list of top-k search results

Internally, the method handles the specifics of preparing the request query to the Azure Cognitive Search service and corresponding payload to obtain the response.
- `dspy.Prediction`: Contains the retrieved passages, each represented as a `dotdict` with a `long_text` attribute.

The method sends a query and number of desired passages (k) to Azure Cognitive Search using `azure_search_request`. This function communicates with Azure and processes the search results as a list of dictionaries.
Internally, the method handles the specifics of preparing the request query to the Azure AI Search service and corresponding payload to obtain the response.

This is then converted to `dotdict` objects that internally map the retrieved content and scores, listed by descending order of relevance.
The function handles the retrieval of the top-k passages based on the provided query.

## Sending Retrieval Requests via Azure Client
1) _**Recommended**_ Configure default RM using `dspy.configure`.
## Sending Retrieval Requests via AzureAISearchRM Client

1. _**Recommended**_ Configure default RM using `dspy.configure`.

This allows you to define programs in DSPy and have DSPy internally conduct retrieval using `dsp.retrieve` on the query on the configured RM.

```python
import dspy
import dsp
from dspy.retrieve.azureaisearch_rm import AzureAISearchRM

azure_search = AzureAISearchRM(
"search_service_name",
"search_api_key",
"search_index_name",
"field_text",
"k"=3
)

dspy.settings.configure(rm= TODO)
retrieval_response = dsp.retrieve("When was the first FIFA World Cup held?", k=5)
dspy.settings.configure(rm=azure_search)
retrieve = dspy.Retrieve(k=3)
retrieval_response = retrieve("What is Thermodynamics").passages

for result in retrieval_response:
print("Text:", result, "\n")
```

2. Generate responses using the client directly.

2) Generate responses using the client directly.
```python
import dspy
from dspy.retrieve.azureaisearch_rm import AzureAISearchRM

retrieval_response = TODO('When was the first FIFA World Cup held?', k=5)
azure_search = AzureAISearchRM(
"search_service_name",
"search_api_key",
"search_index_name",
"field_text",
"k"=3
)

retrieval_response = azure_search("What is Thermodynamics", k=3)
for result in retrieval_response:
print("Text:", result['text'], "\n")
print("Text:", result.long_text, "\n")
```

***

<AuthorDetails name="Arnav Singhvi"/>
<AuthorDetails name="Prajapati Harishkumar Kishorkumar"/>
5 changes: 5 additions & 0 deletions docs/docs/quick-start/installation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,11 @@ import TabItem from '@theme/TabItem';
pip install dspy-ai[mongodb]
```
</TabItem>
<TabItem value="weaviate" label="Weaviate">
```text
pip install dspy-ai[weaviate]
```
</TabItem>

</Tabs>

Expand Down
1 change: 1 addition & 0 deletions dsp/modules/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from .gpt3 import *
from .hf import HFModel
from .hf_client import Anyscale, HFClientTGI, Together
from .mistral import *
from .ollama import *
from .pyserini import *
from .sbert import *
Expand Down
15 changes: 11 additions & 4 deletions dsp/modules/azurecognitivesearch.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@
"Please use the command: pip install azure-search-documents",
)

# Deprecated: This module is scheduled for removal in future releases.
# Please use the AzureAISearchRM class from dspy.retrieve.azureaisearch_rm instead.
# For more information, refer to the updated documentation.

class AzureCognitiveSearch:
"""Wrapper for the Azure Cognitive Search Retrieval."""

Expand All @@ -37,9 +41,12 @@ def __init__(
credential=self.credential)

def __call__(self, query: str, k: int = 10) -> Union[list[str], list[dotdict]]:

print("""# Deprecated: This module is scheduled for removal in future releases.
Please use the AzureAISearchRM class from dspy.retrieve.azureaisearch_rm instead.
For more information, refer to the updated documentation.""")

topk: list[dict[str, Any]] = azure_search_request(self.field_text, self.field_score, self.client, query, k)
topk = [{**d, "long_text": d["text"]} for d in topk]
topk = [{**d, "long_text": d["text"]} for d in topk]

return [dotdict(psg) for psg in topk]

Expand All @@ -65,6 +72,6 @@ def process_azure_result(results:SearchItemPaged, content_key:str, content_score
elif(key == content_score):
tmp["score"] = value
else:
tmp[key] = value
tmp[key] = value
res.append(tmp)
return res
return res
Loading

0 comments on commit b3a1b14

Please sign in to comment.