SQL Extraction Issue with Multi-CTE Queries #751

SynapticSolutionsAI · 2025-01-11T09:53:38Z

When processing queries with multiple CTEs, Vanna's SQL extraction mechanism appears to start from the middle of the query rather than the beginning, resulting in syntax errors. The LLM generates the correct SQL, but the extraction process truncates it incorrectly.

Here's an anonymized example from our logs demonstrating the issue:
LLM Response (correct, complete SQL):
WITH first_cte AS ( SELECT id, timestamp FROM table1 t1 WHERE condition1 = 'value1' GROUP BY id ), second_cte AS ( SELECT t2.id, MAX(t2.date) AS latest_date FROM table2 t2 JOIN table3 t3 ON t3.id = t2.id WHERE condition2 = 'value2' GROUP BY t2.id ) SELECT t.name, t.date, t.metric FROM table4 t JOIN first_cte fc ON fc.id = t.id JOIN second_cte sc ON sc.id = t.id ORDER BY t.metric DESC LIMIT 50;

Extracted SQL (truncated, starts from middle):

SELECT t2.id, MAX(t2.date) AS latest_date FROM table2 t2 JOIN table3 t3 ON t3.id = t2.id WHERE condition2 = 'value2' GROUP BY t2.id ) SELECT t.name, t.date, t.metric FROM table4 t...

The LLM response contains the complete, correct SQL query
The extraction process consistently:
- Misses the first WITH clause
- Starts from a middle SELECT statement
- Results in unmatched parentheses
- Sometimes duplicates the truncated parts

Questions

What causes the extraction to start from the middle of the query rather than the beginning and what is the pattern?
Is there a way to ensure the extraction process captures the complete query, starting from the first WITH clause?
Are there any patterns s or best practices to handle complex queries with multiple CTEs?

We've observed this behavior inconsistently with queries containing multiple CTEs, some queries work, while others don't. On this basis, the behavior is consistent, affected queries always fail, unaffected CTE queries always work. The issue appears to be in the extraction process rather than the LLM response, as the complete SQL in the LLM response is correct.

The text was updated successfully, but these errors were encountered:

SynapticSolutionsAI added the bug Something isn't working label Jan 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL Extraction Issue with Multi-CTE Queries #751

SQL Extraction Issue with Multi-CTE Queries #751

SynapticSolutionsAI commented Jan 11, 2025

SQL Extraction Issue with Multi-CTE Queries #751

SQL Extraction Issue with Multi-CTE Queries #751

Comments

SynapticSolutionsAI commented Jan 11, 2025

Questions