You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When processing queries with multiple CTEs, Vanna's SQL extraction mechanism appears to start from the middle of the query rather than the beginning, resulting in syntax errors. The LLM generates the correct SQL, but the extraction process truncates it incorrectly.
Here's an anonymized example from our logs demonstrating the issue:
LLM Response (correct, complete SQL): WITH first_cte AS ( SELECT id, timestamp FROM table1 t1 WHERE condition1 = 'value1' GROUP BY id ), second_cte AS ( SELECT t2.id, MAX(t2.date) AS latest_date FROM table2 t2 JOIN table3 t3 ON t3.id = t2.id WHERE condition2 = 'value2' GROUP BY t2.id ) SELECT t.name, t.date, t.metric FROM table4 t JOIN first_cte fc ON fc.id = t.id JOIN second_cte sc ON sc.id = t.id ORDER BY t.metric DESC LIMIT 50;
Extracted SQL (truncated, starts from middle):
SELECT t2.id, MAX(t2.date) AS latest_date FROM table2 t2 JOIN table3 t3 ON t3.id = t2.id WHERE condition2 = 'value2' GROUP BY t2.id ) SELECT t.name, t.date, t.metric FROM table4 t...
The LLM response contains the complete, correct SQL query
The extraction process consistently:
Misses the first WITH clause
Starts from a middle SELECT statement
Results in unmatched parentheses
Sometimes duplicates the truncated parts
Questions
What causes the extraction to start from the middle of the query rather than the beginning and what is the pattern?
Is there a way to ensure the extraction process captures the complete query, starting from the first WITH clause?
Are there any patterns s or best practices to handle complex queries with multiple CTEs?
We've observed this behavior inconsistently with queries containing multiple CTEs, some queries work, while others don't. On this basis, the behavior is consistent, affected queries always fail, unaffected CTE queries always work. The issue appears to be in the extraction process rather than the LLM response, as the complete SQL in the LLM response is correct.
The text was updated successfully, but these errors were encountered:
When processing queries with multiple CTEs, Vanna's SQL extraction mechanism appears to start from the middle of the query rather than the beginning, resulting in syntax errors. The LLM generates the correct SQL, but the extraction process truncates it incorrectly.
Here's an anonymized example from our logs demonstrating the issue:
LLM Response (correct, complete SQL):
WITH first_cte AS ( SELECT id, timestamp FROM table1 t1 WHERE condition1 = 'value1' GROUP BY id ), second_cte AS ( SELECT t2.id, MAX(t2.date) AS latest_date FROM table2 t2 JOIN table3 t3 ON t3.id = t2.id WHERE condition2 = 'value2' GROUP BY t2.id ) SELECT t.name, t.date, t.metric FROM table4 t JOIN first_cte fc ON fc.id = t.id JOIN second_cte sc ON sc.id = t.id ORDER BY t.metric DESC LIMIT 50;
Extracted SQL (truncated, starts from middle):
SELECT t2.id, MAX(t2.date) AS latest_date FROM table2 t2 JOIN table3 t3 ON t3.id = t2.id WHERE condition2 = 'value2' GROUP BY t2.id ) SELECT t.name, t.date, t.metric FROM table4 t...
WITH
clauseSELECT
statementQuestions
WITH
clause?We've observed this behavior inconsistently with queries containing multiple CTEs, some queries work, while others don't. On this basis, the behavior is consistent, affected queries always fail, unaffected CTE queries always work. The issue appears to be in the extraction process rather than the LLM response, as the complete SQL in the LLM response is correct.
The text was updated successfully, but these errors were encountered: