Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sql_query): validate if the query is not malicious #1568

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ArslanSaleem
Copy link
Collaborator

@ArslanSaleem ArslanSaleem commented Jan 30, 2025

Important

Add SQL query validation to prevent execution of malicious queries in sql_loader.py.

  • Behavior:
    • Add is_sql_query_safe function in sql_sanitizer.py to validate SQL queries against a list of dangerous keywords.
    • Raise MaliciousQueryError in execute_query in sql_loader.py if query is unsafe.
  • Tests:
    • Add tests for is_sql_query_safe in test_sql_sanitizer.py to cover various SQL query scenarios.
    • Add tests in test_loader.py to ensure execute_query handles safe and malicious queries correctly.

This description was created by Ellipsis for 9b50ea0. It will automatically update as commits are pushed.

@ArslanSaleem ArslanSaleem requested a review from gventuri January 30, 2025 18:16
Copy link

codecov bot commented Jan 30, 2025

Codecov Report

Attention: Patch coverage is 85.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 82.53%. Comparing base (01bf53e) to head (2d4cae7).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
pandasai/data_loader/loader.py 50.00% 2 Missing ⚠️
pandasai/helpers/sql_sanitizer.py 93.75% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1568   +/-   ##
=======================================
  Coverage   82.53%   82.53%           
=======================================
  Files          64       64           
  Lines        2416     2434   +18     
=======================================
+ Hits         1994     2009   +15     
- Misses        422      425    +3     
Flag Coverage Δ
unittests 82.53% <85.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to 8d50ee3 in 1 minute and 54 seconds

More details
  • Looked at 193 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 4 drafted comments based on config settings.
1. tests/unit_tests/helpers/test_sql_sanitizer.py:55
  • Draft comment:
    The test case test_safe_select_with_comment incorrectly expects the query to be unsafe. Comments in SQL are not inherently dangerous and should not cause a query to be marked as unsafe. Adjust the test expectation to assert is_sql_query_safe(query). This issue is also present in test_safe_select_with_inline_comment.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The comment suggests changing the test expectations, claiming SQL comments are safe. However, the code and tests clearly show that blocking comments is an intentional security feature. SQL comments can be used to hide malicious code or for SQL injection attacks. The current implementation considers comments unsafe by design.
    Maybe there's a valid use case for allowing certain types of comments in SQL queries in some contexts. The comment author might have experience with systems where SQL comments are permitted.
    While comments might be safe in some contexts, the current implementation explicitly treats them as unsafe as a security measure. The tests correctly reflect the intended behavior.
    Delete the comment because it contradicts the intentional security design of blocking SQL comments, which is a valid security measure against SQL injection attacks.
2. tests/unit_tests/helpers/test_sql_sanitizer.py:63
  • Draft comment:
    The comment in test_unsafe_query_with_subquery is misleading. The query is actually safe, and the test should expect assert is_sql_query_safe(query) to be True. Adjust the comment to reflect that there are no dangerous keywords in the main or subquery.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
3. pandasai/helpers/sql_sanitizer.py:65
  • Draft comment:
    The logic here is incorrect. The function should return True for safe queries and False for unsafe ones. The condition should be if parsed.key != "SELECT": to ensure only SELECT queries are considered safe.
  • Reason this comment was not posted:
    Marked as duplicate.
4. tests/unit_tests/helpers/test_sql_sanitizer.py:63
  • Draft comment:
    The assertion here is incorrect. The comment suggests that the query should be unsafe, but the assertion checks for safety. Update the assertion to assert not is_sql_query_safe(query) to match the comment.
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_XaZsF1TPw6H2t2kX


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

parsed = sqlglot.parse_one(query)

# Ensure the main query is SELECT or WITH
if parsed.key == "SELECT":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition if parsed.key == "SELECT": return False is incorrect. It should return True for SELECT queries, as they are generally safe. This logic error causes safe SELECT queries to be marked as unsafe.

@@ -197,6 +197,9 @@ def execute_query(self, query: str, params: Optional[list] = None) -> pd.DataFra
load_function = self._get_loader_function(source_type)

try:
if not is_sql_query_safe(formatted_query):
raise MaliciousQueryError("Query is not safe to execute.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's write a test for this!

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 2d4cae7 in 45 seconds

More details
  • Looked at 15 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. pandasai/helpers/sql_sanitizer.py:65
  • Draft comment:
    The check for the main query being "SELECT" is too restrictive. Consider allowing "WITH" clauses as they are often used with SELECT queries.
        if parsed.key.upper() not in ["SELECT", "WITH"]:
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The file is a security-focused SQL sanitizer. WITH clauses are common table expressions that are read-only and can't modify data. However, the code's intent seems to be maximum security by only allowing pure SELECT statements. The change from checking if key == "SELECT" to key.upper() != "SELECT" suggests this restriction was intentional. In a security context, being more restrictive is often the safer choice.
    WITH clauses are indeed safe for read-only operations and blocking them might be overly restrictive for legitimate use cases. The comment raises a valid technical point.
    However, this is a security-focused sanitizer where being overly restrictive is better than being too permissive. The change appears intentional to increase security.
    Delete the comment. While technically correct, the restriction to SELECT-only appears to be an intentional security decision, and we shouldn't second-guess security restrictions without strong justification.
2. pandasai/helpers/sql_sanitizer.py:66
  • Draft comment:
    The error message should be more descriptive. Consider using: "The main query must be a SELECT statement."
  • Reason this comment was not posted:
    Comment was on unchanged code.

Workflow ID: wflow_vGj0Ohvhxjd4yLhu


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on 9b50ea0 in 33 seconds

More details
  • Looked at 91 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. pandasai/data_loader/sql_loader.py:43
  • Draft comment:
    Consider providing more details in the error message to help users understand why the query is considered unsafe. For example, include which part of the query triggered the error.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The PR introduces a security check for SQL queries, but the error message could be more informative.

Workflow ID: wflow_9ajIftmig4KnxG6f


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@@ -38,6 +39,9 @@ def execute_query(self, query: str, params: Optional[list] = None) -> pd.DataFra
formatted_query = self.query_builder.format_query(query)
load_function = self._get_loader_function(source_type)

if not is_sql_query_safe(formatted_query):
raise MaliciousQueryError("Query is not safe to execute.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider rephrasing the error message for clarity and professionalism.

Suggested change
raise MaliciousQueryError("Query is not safe to execute.")
raise MaliciousQueryError("The SQL query is deemed unsafe and will not be executed.")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants