You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@pamelafox
I have been trying for a few days to adapt the system to work with my data - a single table of items - let's say they are cars
I have modified following the guidance in customizing_data.md and the other data branch. I have looked through other branches, am just not having any luck - example queries, few shot, answer.txt, query.txt and searcher examples below.
I am really not sure where the structured logic of things like queries and creating a ton of functions to handle the possible query types - and how much to rely on the prompts to do things. For instance, when I ask for a list of something I tend to get 3 full car descriptions. I realize the 3 is probably being set in the developer settings.
Guidance here would be greatly appreciated! I realize this project is 5 months old and as an ex-microsoftee, I know you are moving ahead at full speed, but I have found this template extremely helpful in accelerating my client's embracing AI and Azure for all AI related workloads. (I fully understand this is not intended for production use, but it has many best practices built into the solution.
Specifically, I would like to know:
what is the extent of the power of the prompts, what aspects of system behavior should I lean on them for (see below)
how much should I be modifying QueryRewriter and PostgresSearcher, how generic should those changes be (see below)
exactly what components in the FrontEnd need to be modified to handle different response types - say lists, citations under each response in a field that toggles visibility like in a file browser.
Examples of what I have been trying - all at the same time - app runs and deploys file rarely crashes on a query.
trying to do is get support for queries like:
Q: how many 1989 Ford's do we have? I want the answer to be something like
A: we have 12 Ford cars in inventory
follow up question would you like to see a list of all of the?
Q: Which is the range of reliability ratings?
A: The cars in inventory range from 3 to 4.5 safety ratings
follow up would you like to see a list of them sorted by rating?
Q: of the different manufacturers, which is the most popular?
A: the Subaru Outback is the most popular, using the formula of the sum of the count of the number of votes for each star rating (1-5) /the number of votes
Q: what adjective word occurs the most in the comments section
A: happy was mentioned in 20 out of 120 reviews
I have tried modified the following:
QueryRewriter Updates - a bunch like this
"select": {
"type": "object",
"description": "Specify scalar operations like COUNT, AVG, etc.",
"properties": {
"operation": {
"type": "string",
"enum": ["COUNT", "AVG", "SUM", "MIN", "MAX"],
"description": "The scalar operation to perform",
},
"field": {
"type": "string",
"description": "The field to operate on, e.g. '*' for COUNT(*) or 'metrics->>'Views'' for views",
},
},
"required": ["operation", "field"],
},
PostgresSearcher Updates:
Implement scalar operations for numeric metrics
Add type coercion for mixed string/number metrics
Example query: metrics->>'Views'::numeric for consistent numeric handling
QueryRewriter Updates
Add metric name patterns to query generation
Handle numeric comparisons in natural language
Update few-shot with entries like
{
"role": "user",
"content": "how many Ford cars do we have?"
},
{
"role": "assistant",
"tool_calls": [{
"function": {
"name": "search_database",
"arguments": {
"return_scalar": {
"type": "count",
"expression": "*"
},
"filters": {
"Brand": "ig",
"category": "cars"
}
}
}
}]
}
Answers.txt
Assistant helps users find relevant cars and trucks.
Respond as if you are a knowledgeable content curator. Do NOT respond with tables.
Answer ONLY with the content details listed in the sources.
If there isn't enough information below, say you don't know.
Do not generate answers that don't use the sources below.
For scalar query results i.e. how many, highest rating, (like COUNT, AVG, etc):
Look for the metrics.scalar_result field which contains the actual numeric value
Present the value clearly and directly in a complete sentence
Include the [scalar_result] citation
query.txt
You are a query generator that helps translate natural language questions into search queries or scalar operations.
For scalar operations (like counting, averaging, finding max/min), generate a scalar operation with:
select_expr: The expression to compute (e.g. MAX, COUNT, AVG)
group_by: Optional list of columns to group by
order_by: Optional list of columns to order by
filters: List of filters to apply
Examples:
"How many Ford cars are there?" -> COUNT(*) with brand = 'ford' and category='car'
"What vehicle has the most likes?" -> MAX((metrics->>'Likes')::numeric)
"Average price of Subarus?" -> AVG((metrics->>'price')::numeric) with brand = 'subaru'
For regular search queries, generate a search query based on the conversation.
If the question is not in English, translate it to English before generating the query.
If you cannot generate a search query, return the original user question.
Below is a history of the conversation so far, and a new question asked by the user that needs to be answered by searching database rows.
You have access to an Azure PostgreSQL database with a posts table that has the following columns:
Text fields: model, brand, category, description
Arrays: reviews, stars, features
JSON fields: ownership_records (containing key value pairs of owner_name, puchase_date, Purchase_location, sale_date, sale_location )
For questions asking about counts, averages, sums, or other aggregate metrics:
Use the "select" operation with COUNT, AVG, SUM, etc.
Example: "how many Subaru wagons? are in stock" -> use COUNT(*) with brand="subaru" and category= "wagon"
Example: "average rating of fords?" -> use AVG(metrics->>'rating') with brand="ford"
You can combine with filters like channel, date ranges, etc.
The text was updated successfully, but these errors were encountered:
It seems like you want to be able to execute arbitrary SQL queries, not just a SELECT to grab the few most similar rows. This repository isn't well set up to do that currently, as I was thinking of this as more of a public user-facing app where you wouldn't necessarily be comfortable executing arbitrary queries.
That's on Azure SQL, but that idea is similar in PostgreSQL.
So I think you want to bring that sort of approach into this repo. I think you'd need to modify PostgresSearcher a fair bit since it's currently got most of the SQL query pre-formed, or at least add a separate function for running the more free-form suggested queries.
@pamelafox thanks - the path i am on now is as you suggest building a new funcyion in postgres server, doing some modifications to query snslyzer and then rag advanced.
would you say that the heart of the matter is in those files plus the query and answer prompts and the fews shot. I really need to know if I am looking at the right stuff and not missing anything - say api routes or who knows.
really appreciate tgr6 quick reply - answering these final questions will get me squared away.
@pamelafox
I have been trying for a few days to adapt the system to work with my data - a single table of items - let's say they are cars
I have modified following the guidance in customizing_data.md and the other data branch. I have looked through other branches, am just not having any luck - example queries, few shot, answer.txt, query.txt and searcher examples below.
I am really not sure where the structured logic of things like queries and creating a ton of functions to handle the possible query types - and how much to rely on the prompts to do things. For instance, when I ask for a list of something I tend to get 3 full car descriptions. I realize the 3 is probably being set in the developer settings.
Guidance here would be greatly appreciated! I realize this project is 5 months old and as an ex-microsoftee, I know you are moving ahead at full speed, but I have found this template extremely helpful in accelerating my client's embracing AI and Azure for all AI related workloads. (I fully understand this is not intended for production use, but it has many best practices built into the solution.
Specifically, I would like to know:
what is the extent of the power of the prompts, what aspects of system behavior should I lean on them for (see below)
how much should I be modifying QueryRewriter and PostgresSearcher, how generic should those changes be (see below)
exactly what components in the FrontEnd need to be modified to handle different response types - say lists, citations under each response in a field that toggles visibility like in a file browser.
Examples of what I have been trying - all at the same time - app runs and deploys file rarely crashes on a query.
trying to do is get support for queries like:
Q: how many 1989 Ford's do we have? I want the answer to be something like
A: we have 12 Ford cars in inventory
Q: Which is the range of reliability ratings?
A: The cars in inventory range from 3 to 4.5 safety ratings
Q: of the different manufacturers, which is the most popular?
A: the Subaru Outback is the most popular, using the formula of the sum of the count of the number of votes for each star rating (1-5) /the number of votes
Q: what adjective word occurs the most in the comments section
A: happy was mentioned in 20 out of 120 reviews
I have tried modified the following:
QueryRewriter Updates - a bunch like this
PostgresSearcher Updates:
Implement scalar operations for numeric metrics
Add type coercion for mixed string/number metrics
Example query: metrics->>'Views'::numeric for consistent numeric handling
QueryRewriter Updates
Add metric name patterns to query generation
Handle numeric comparisons in natural language
Update few-shot with entries like
{
"role": "user",
"content": "how many Ford cars do we have?"
},
{
"role": "assistant",
"tool_calls": [{
"function": {
"name": "search_database",
"arguments": {
"return_scalar": {
"type": "count",
"expression": "*"
},
"filters": {
"Brand": "ig",
"category": "cars"
}
}
}
}]
}
Answers.txt
Assistant helps users find relevant cars and trucks.
Respond as if you are a knowledgeable content curator. Do NOT respond with tables.
Answer ONLY with the content details listed in the sources.
If there isn't enough information below, say you don't know.
Do not generate answers that don't use the sources below.
For scalar query results i.e. how many, highest rating, (like COUNT, AVG, etc):
query.txt
You are a query generator that helps translate natural language questions into search queries or scalar operations.
For scalar operations (like counting, averaging, finding max/min), generate a scalar operation with:
select_expr: The expression to compute (e.g. MAX, COUNT, AVG)
group_by: Optional list of columns to group by
order_by: Optional list of columns to order by
filters: List of filters to apply
Examples:
"How many Ford cars are there?" -> COUNT(*) with brand = 'ford' and category='car'
"What vehicle has the most likes?" -> MAX((metrics->>'Likes')::numeric)
"Average price of Subarus?" -> AVG((metrics->>'price')::numeric) with brand = 'subaru'
For regular search queries, generate a search query based on the conversation.
If the question is not in English, translate it to English before generating the query.
If you cannot generate a search query, return the original user question.
Below is a history of the conversation so far, and a new question asked by the user that needs to be answered by searching database rows.
You have access to an Azure PostgreSQL database with a posts table that has the following columns:
Text fields: model, brand, category, description
Arrays: reviews, stars, features
JSON fields: ownership_records (containing key value pairs of owner_name, puchase_date, Purchase_location, sale_date, sale_location )
For questions asking about counts, averages, sums, or other aggregate metrics:
Use the "select" operation with COUNT, AVG, SUM, etc.
Example: "how many Subaru wagons? are in stock" -> use COUNT(*) with brand="subaru" and category= "wagon"
Example: "average rating of fords?" -> use AVG(metrics->>'rating') with brand="ford"
You can combine with filters like channel, date ranges, etc.
The text was updated successfully, but these errors were encountered: