fix(sql_query): validate if the query is not malicious #1568

ArslanSaleem · 2025-01-30T18:16:59Z

Important

Add SQL query validation to prevent execution of malicious queries in sql_loader.py.

Behavior:
- Add is_sql_query_safe function in sql_sanitizer.py to validate SQL queries against a list of dangerous keywords.
- Raise MaliciousQueryError in execute_query in sql_loader.py if query is unsafe.
Tests:
- Add tests for is_sql_query_safe in test_sql_sanitizer.py to cover various SQL query scenarios.
- Add tests in test_loader.py to ensure execute_query handles safe and malicious queries correctly.

^{This description was created by}^{for 9b50ea0. It will automatically update as commits are pushed.}

codecov · 2025-01-30T18:18:57Z

Codecov Report

Attention: Patch coverage is 85.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 82.53%. Comparing base (01bf53e) to head (2d4cae7).
Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
pandasai/data_loader/loader.py	50.00%	2 Missing ⚠️
pandasai/helpers/sql_sanitizer.py	93.75%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1568   +/-   ##
=======================================
  Coverage   82.53%   82.53%           
=======================================
  Files          64       64           
  Lines        2416     2434   +18     
=======================================
+ Hits         1994     2009   +15     
- Misses        422      425    +3

Flag	Coverage Δ
unittests	`82.53% <85.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ellipsis-dev

❌ Changes requested. Reviewed everything up to 8d50ee3 in 1 minute and 54 seconds

More details

Looked at 193 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 4 drafted comments based on config settings.

1. tests/unit_tests/helpers/test_sql_sanitizer.py:55

Draft comment:
The test case test_safe_select_with_comment incorrectly expects the query to be unsafe. Comments in SQL are not inherently dangerous and should not cause a query to be marked as unsafe. Adjust the test expectation to assert is_sql_query_safe(query). This issue is also present in test_safe_select_with_inline_comment.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment suggests changing the test expectations, claiming SQL comments are safe. However, the code and tests clearly show that blocking comments is an intentional security feature. SQL comments can be used to hide malicious code or for SQL injection attacks. The current implementation considers comments unsafe by design.
Maybe there's a valid use case for allowing certain types of comments in SQL queries in some contexts. The comment author might have experience with systems where SQL comments are permitted.
While comments might be safe in some contexts, the current implementation explicitly treats them as unsafe as a security measure. The tests correctly reflect the intended behavior.
Delete the comment because it contradicts the intentional security design of blocking SQL comments, which is a valid security measure against SQL injection attacks.

2. tests/unit_tests/helpers/test_sql_sanitizer.py:63

Draft comment:
The comment in test_unsafe_query_with_subquery is misleading. The query is actually safe, and the test should expect assert is_sql_query_safe(query) to be True. Adjust the comment to reflect that there are no dangerous keywords in the main or subquery.
Reason this comment was not posted:
Comment looked like it was already resolved.

3. pandasai/helpers/sql_sanitizer.py:65

Draft comment:
The logic here is incorrect. The function should return True for safe queries and False for unsafe ones. The condition should be if parsed.key != "SELECT": to ensure only SELECT queries are considered safe.
Reason this comment was not posted:
Marked as duplicate.

4. tests/unit_tests/helpers/test_sql_sanitizer.py:63

Draft comment:
The assertion here is incorrect. The comment suggests that the query should be unsafe, but the assertion checks for safety. Update the assertion to assert not is_sql_query_safe(query) to match the comment.
Reason this comment was not posted:
Marked as duplicate.

Workflow ID: wflow_XaZsF1TPw6H2t2kX

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev · 2025-01-30T18:19:01Z

pandasai/helpers/sql_sanitizer.py

+        parsed = sqlglot.parse_one(query)
+
+        # Ensure the main query is SELECT or WITH
+        if parsed.key == "SELECT":


The condition if parsed.key == "SELECT": return False is incorrect. It should return True for SELECT queries, as they are generally safe. This logic error causes safe SELECT queries to be marked as unsafe.

gventuri · 2025-01-30T19:17:50Z

pandasai/data_loader/loader.py

@@ -197,6 +197,9 @@ def execute_query(self, query: str, params: Optional[list] = None) -> pd.DataFra
        load_function = self._get_loader_function(source_type)

        try:
+            if not is_sql_query_safe(formatted_query):
+                raise MaliciousQueryError("Query is not safe to execute.")


Let's write a test for this!

ellipsis-dev

👍 Looks good to me! Incremental review on 2d4cae7 in 45 seconds

More details

Looked at 15 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 drafted comments based on config settings.

1. pandasai/helpers/sql_sanitizer.py:65

Draft comment:
The check for the main query being "SELECT" is too restrictive. Consider allowing "WITH" clauses as they are often used with SELECT queries.

        if parsed.key.upper() not in ["SELECT", "WITH"]:

Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The file is a security-focused SQL sanitizer. WITH clauses are common table expressions that are read-only and can't modify data. However, the code's intent seems to be maximum security by only allowing pure SELECT statements. The change from checking if key == "SELECT" to key.upper() != "SELECT" suggests this restriction was intentional. In a security context, being more restrictive is often the safer choice.
WITH clauses are indeed safe for read-only operations and blocking them might be overly restrictive for legitimate use cases. The comment raises a valid technical point.
However, this is a security-focused sanitizer where being overly restrictive is better than being too permissive. The change appears intentional to increase security.
Delete the comment. While technically correct, the restriction to SELECT-only appears to be an intentional security decision, and we shouldn't second-guess security restrictions without strong justification.

2. pandasai/helpers/sql_sanitizer.py:66

Draft comment:
The error message should be more descriptive. Consider using: "The main query must be a SELECT statement."
Reason this comment was not posted:
Comment was on unchanged code.

Workflow ID: wflow_vGj0Ohvhxjd4yLhu

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

❌ Changes requested. Incremental review on 9b50ea0 in 33 seconds

More details

Looked at 91 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. pandasai/data_loader/sql_loader.py:43

Draft comment:
Consider providing more details in the error message to help users understand why the query is considered unsafe. For example, include which part of the query triggered the error.
Reason this comment was not posted:
Confidence changes required: 50%
The PR introduces a security check for SQL queries, but the error message could be more informative.

Workflow ID: wflow_9ajIftmig4KnxG6f

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev · 2025-01-31T10:04:18Z

pandasai/data_loader/sql_loader.py

@@ -38,6 +39,9 @@ def execute_query(self, query: str, params: Optional[list] = None) -> pd.DataFra
        formatted_query = self.query_builder.format_query(query)
        load_function = self._get_loader_function(source_type)

+        if not is_sql_query_safe(formatted_query):
+            raise MaliciousQueryError("Query is not safe to execute.")


Consider rephrasing the error message for clarity and professionalism.

Suggested change

raise MaliciousQueryError("Query is not safe to execute.")

raise MaliciousQueryError("The SQL query is deemed unsafe and will not be executed.")

fix(sql_query): validate if the query is not malicious

8d50ee3

ArslanSaleem requested a review from gventuri January 30, 2025 18:16

ellipsis-dev bot reviewed Jan 30, 2025

View reviewed changes

gventuri reviewed Jan 30, 2025

View reviewed changes

fix(sql_sanitzer): fix condition of sql

2d4cae7

ellipsis-dev bot reviewed Jan 31, 2025

View reviewed changes

ArslanSaleem added 2 commits January 31, 2025 09:52

Merge branch 'main' into fix/sql_sanitize

554b320

feat(sql_sanitize): integrate sql_sanitize in new loader and test cases

9b50ea0

ellipsis-dev bot reviewed Jan 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sql_query): validate if the query is not malicious #1568

fix(sql_query): validate if the query is not malicious #1568

ArslanSaleem commented Jan 30, 2025 •

edited by ellipsis-dev bot

Loading

codecov bot commented Jan 30, 2025 •

edited

Loading

ellipsis-dev bot left a comment

ellipsis-dev bot Jan 30, 2025

gventuri Jan 30, 2025

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot Jan 31, 2025

	raise MaliciousQueryError("Query is not safe to execute.")
	raise MaliciousQueryError("The SQL query is deemed unsafe and will not be executed.")

fix(sql_query): validate if the query is not malicious #1568

Are you sure you want to change the base?

fix(sql_query): validate if the query is not malicious #1568

Conversation

ArslanSaleem commented Jan 30, 2025 • edited by ellipsis-dev bot Loading

codecov bot commented Jan 30, 2025 • edited Loading

Codecov Report

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot Jan 30, 2025

Choose a reason for hiding this comment

gventuri Jan 30, 2025

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot Jan 31, 2025

Choose a reason for hiding this comment

ArslanSaleem commented Jan 30, 2025 •

edited by ellipsis-dev bot

Loading

codecov bot commented Jan 30, 2025 •

edited

Loading