Security: LLM-generated SQL is executed without safety controls#193
Conversation
The `/query` endpoint generates SQL from natural language (`generate_sql`) and executes it (`execute_query`) without validation/sandboxing. Prompt injection or model errors can produce harmful SQL statements. Affected files: main.py Signed-off-by: tomaioo <203048277+tomaioo@users.noreply.github.com>
| raise HTTPException(status_code=400, detail="Generated SQL query is empty") | ||
| if "--" in normalized_query or "/*" in normalized_query or "*/" in normalized_query: | ||
| raise HTTPException(status_code=400, detail="Generated SQL query contains blocked SQL patterns") | ||
| if ";" in normalized_query[:-1]: |
There was a problem hiding this comment.
WARNING: Redundant semicolon check
The condition if ";" in normalized_query[:-1] (line 53) is redundant because if the query ends with a semicolon, it gets stripped at line 56. This check will incorrectly reject valid single-statement queries with trailing semicolons. Remove line 53 and let line 55 handle trailing semicolons.
| raise HTTPException(status_code=400, detail="Only read-only SELECT queries are allowed") | ||
| for keyword in DISALLOWED_SQL_KEYWORDS: | ||
| if re.search(rf"\b{keyword}\b", lowered_query): | ||
| raise HTTPException(status_code=400, detail=f"Generated SQL query contains blocked keyword: {keyword}") |
There was a problem hiding this comment.
SUGGESTION: Regex could be more precise
The keyword check uses \b word boundaries (line 62), which works but might have edge cases. Consider using lookarounds for clearer keyword detection or adding additional checks for common injection patterns like subqueries or stored procedures.
| if normalized_query.endswith(";"): | ||
| normalized_query = normalized_query[:-1].strip() | ||
| lowered_query = normalized_query.lower() | ||
| if not re.match(r"^(select|with)\b", lowered_query): |
There was a problem hiding this comment.
SUGGESTION: Case-insensitive regex could fail for edge cases
The regex r"^(select|with)\b" (line 58) works with .lower(), which handles case-insensitive matching. However, this pattern might fail for edge cases like queries starting with multiple keywords or with unusual spacing. Consider a more robust pattern or use a SQL parsing library for production use.
Code Review SummaryStatus: 1 Issue Found | Recommendation: Address before merge Overview
Issue Details (click to expand)WARNING
Other Observations (not in diff)Issues found in unchanged code that cannot receive inline comments:
Files Reviewed (1 file)
Reviewed by glm-4.7-flash-20260119 · 147,200 tokens |
Summary
Security: LLM-generated SQL is executed without safety controls
Problem
Severity:
High| File:mcp_ai_agents/telemetry-mcp-okahu/main.py:L34The
/queryendpoint generates SQL from natural language (generate_sql) and executes it (execute_query) without validation/sandboxing. Prompt injection or model errors can produce harmful SQL statements.Solution
Introduce SQL safety gates before execution: parse and enforce read-only statements, block DDL/DML, enforce table/column allowlists, and run with least-privilege DB credentials.
Changes
mcp_ai_agents/telemetry-mcp-okahu/main.py(modified)Testing