-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(sql_query): validate if the query is not malicious #1568
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1568 +/- ##
=======================================
Coverage 82.53% 82.53%
=======================================
Files 64 64
Lines 2416 2434 +18
=======================================
+ Hits 1994 2009 +15
- Misses 422 425 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Reviewed everything up to 8d50ee3 in 1 minute and 54 seconds
More details
- Looked at
193
lines of code in3
files - Skipped
0
files when reviewing. - Skipped posting
4
drafted comments based on config settings.
1. tests/unit_tests/helpers/test_sql_sanitizer.py:55
- Draft comment:
The test casetest_safe_select_with_comment
incorrectly expects the query to be unsafe. Comments in SQL are not inherently dangerous and should not cause a query to be marked as unsafe. Adjust the test expectation toassert is_sql_query_safe(query)
. This issue is also present intest_safe_select_with_inline_comment
. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment suggests changing the test expectations, claiming SQL comments are safe. However, the code and tests clearly show that blocking comments is an intentional security feature. SQL comments can be used to hide malicious code or for SQL injection attacks. The current implementation considers comments unsafe by design.
Maybe there's a valid use case for allowing certain types of comments in SQL queries in some contexts. The comment author might have experience with systems where SQL comments are permitted.
While comments might be safe in some contexts, the current implementation explicitly treats them as unsafe as a security measure. The tests correctly reflect the intended behavior.
Delete the comment because it contradicts the intentional security design of blocking SQL comments, which is a valid security measure against SQL injection attacks.
2. tests/unit_tests/helpers/test_sql_sanitizer.py:63
- Draft comment:
The comment intest_unsafe_query_with_subquery
is misleading. The query is actually safe, and the test should expectassert is_sql_query_safe(query)
to beTrue
. Adjust the comment to reflect that there are no dangerous keywords in the main or subquery. - Reason this comment was not posted:
Comment looked like it was already resolved.
3. pandasai/helpers/sql_sanitizer.py:65
- Draft comment:
The logic here is incorrect. The function should returnTrue
for safe queries andFalse
for unsafe ones. The condition should beif parsed.key != "SELECT":
to ensure only SELECT queries are considered safe. - Reason this comment was not posted:
Marked as duplicate.
4. tests/unit_tests/helpers/test_sql_sanitizer.py:63
- Draft comment:
The assertion here is incorrect. The comment suggests that the query should be unsafe, but the assertion checks for safety. Update the assertion toassert not is_sql_query_safe(query)
to match the comment. - Reason this comment was not posted:
Marked as duplicate.
Workflow ID: wflow_XaZsF1TPw6H2t2kX
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
pandasai/helpers/sql_sanitizer.py
Outdated
parsed = sqlglot.parse_one(query) | ||
|
||
# Ensure the main query is SELECT or WITH | ||
if parsed.key == "SELECT": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition if parsed.key == "SELECT": return False
is incorrect. It should return True
for SELECT
queries, as they are generally safe. This logic error causes safe SELECT
queries to be marked as unsafe.
pandasai/data_loader/loader.py
Outdated
@@ -197,6 +197,9 @@ def execute_query(self, query: str, params: Optional[list] = None) -> pd.DataFra | |||
load_function = self._get_loader_function(source_type) | |||
|
|||
try: | |||
if not is_sql_query_safe(formatted_query): | |||
raise MaliciousQueryError("Query is not safe to execute.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's write a test for this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 2d4cae7 in 45 seconds
More details
- Looked at
15
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
2
drafted comments based on config settings.
1. pandasai/helpers/sql_sanitizer.py:65
- Draft comment:
The check for the main query being "SELECT" is too restrictive. Consider allowing "WITH" clauses as they are often used with SELECT queries.
if parsed.key.upper() not in ["SELECT", "WITH"]:
- Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The file is a security-focused SQL sanitizer. WITH clauses are common table expressions that are read-only and can't modify data. However, the code's intent seems to be maximum security by only allowing pure SELECT statements. The change from checking if key == "SELECT" to key.upper() != "SELECT" suggests this restriction was intentional. In a security context, being more restrictive is often the safer choice.
WITH clauses are indeed safe for read-only operations and blocking them might be overly restrictive for legitimate use cases. The comment raises a valid technical point.
However, this is a security-focused sanitizer where being overly restrictive is better than being too permissive. The change appears intentional to increase security.
Delete the comment. While technically correct, the restriction to SELECT-only appears to be an intentional security decision, and we shouldn't second-guess security restrictions without strong justification.
2. pandasai/helpers/sql_sanitizer.py:66
- Draft comment:
The error message should be more descriptive. Consider using:"The main query must be a SELECT statement."
- Reason this comment was not posted:
Comment was on unchanged code.
Workflow ID: wflow_vGj0Ohvhxjd4yLhu
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Incremental review on 9b50ea0 in 33 seconds
More details
- Looked at
91
lines of code in2
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. pandasai/data_loader/sql_loader.py:43
- Draft comment:
Consider providing more details in the error message to help users understand why the query is considered unsafe. For example, include which part of the query triggered the error. - Reason this comment was not posted:
Confidence changes required:50%
The PR introduces a security check for SQL queries, but the error message could be more informative.
Workflow ID: wflow_9ajIftmig4KnxG6f
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
@@ -38,6 +39,9 @@ def execute_query(self, query: str, params: Optional[list] = None) -> pd.DataFra | |||
formatted_query = self.query_builder.format_query(query) | |||
load_function = self._get_loader_function(source_type) | |||
|
|||
if not is_sql_query_safe(formatted_query): | |||
raise MaliciousQueryError("Query is not safe to execute.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider rephrasing the error message for clarity and professionalism.
raise MaliciousQueryError("Query is not safe to execute.") | |
raise MaliciousQueryError("The SQL query is deemed unsafe and will not be executed.") |
Important
Add SQL query validation to prevent execution of malicious queries in
sql_loader.py
.is_sql_query_safe
function insql_sanitizer.py
to validate SQL queries against a list of dangerous keywords.MaliciousQueryError
inexecute_query
insql_loader.py
if query is unsafe.is_sql_query_safe
intest_sql_sanitizer.py
to cover various SQL query scenarios.test_loader.py
to ensureexecute_query
handles safe and malicious queries correctly.This description was created by for 9b50ea0. It will automatically update as commits are pushed.