-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Python: Add Pandas SQLi sinks #19594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces explicit SQL injection modeling for pandas.read_sql
and pandas.read_sql_query
, updates existing tests to use an sqlite3
connection for sink detection, and adds a change note entry.
- Updated pandas DataFrame test to connect via
sqlite3
and mark SQL sinks with$getSql
. - Added a
ReadSQLCall
QL class to flag raw SQL queries in Pandas. - Documented the new analysis in the change notes.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
python/ql/test/library-tests/frameworks/pandas/dataframe_query.py | Replaced path-based calls with sqlite3.connect , updated sink markers |
python/ql/src/change-notes/2025-05-26-pandas-sqli-sinks.md | Added entry for Pandas SQLi sinks |
python/ql/lib/semmle/python/frameworks/Pandas.qll | Introduced ReadSQLCall to model read_sql and read_sql_query as sinks |
Comments suppressed due to low confidence (2)
python/ql/test/library-tests/frameworks/pandas/dataframe_query.py:59
- This test covers positional arguments but does not cover keyword argument usage (e.g.,
sql=
andcon=
). Adding a test forpd.read_sql_query(sql="...", con=connection)
would ensure named-parameter sinks are detected.
df = pd.read_sql_query("sql query", connection) # $getSql="sql query"
python/ql/lib/semmle/python/frameworks/Pandas.qll:161
- Unconditionally treating
read_sql
as a sink may flag saferead_sql_table
calls (whichread_sql
can dispatch to). Consider refining the model to only treat calls that route toread_sql_query
or inspect argument patterns to avoid false positives.
ReadSQLCall() { this = API::moduleImport("pandas").getMember(["read_sql", "read_sql_query"]).getACall() }
connection = sqlite3.connect("pets.db") | ||
df = pd.read_sql_query("sql query", connection) # $getSql="sql query" | ||
df.query("query") # $getCode="query" | ||
df.eval("query") # $getCode="query" | ||
|
||
df = pd.read_sql("filepath", 'postgres:///db_name') | ||
connection = sqlite3.connect("pets.db") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The test file creates the same connection
twice. Consider moving connection = sqlite3.connect("pets.db")
to a shared setup block or top-level fixture to avoid duplication.
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Short and sweet! Looks good to me. 👍
Thanks @tausbn ! I fixed some nitpicks from automation :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the additional fixes! 🙂
Pandas has a
read_sql
sink, which allows for executing raw SQL queries with untrusted input.read_sql
is a convenience method, that dispatches the query either toread_sql_table
orread_sql_query
. Please note thatread_sql_table
is not vulnerable to SQLi, butread_sql_query
is.Note that instead of adding new tests, I edited the previous tests, because they contained the vulnerable
read_sql
andread_sql_query
method, which made them show up as failed.