This page documents all vulnerability classes that PyAegis detects, the sources and sinks involved, and concrete code examples.
All detectors are driven by the default rule set. You can override or extend any of them with a custom YAML file.
PyAegis currently uses a single unified rule ID (PYA-TAINT) for all taint-flow findings. The description field specifies the exact sink reached. Granular rule IDs per vulnerability class are planned for a future release.
A source is any expression or call that introduces untrusted, externally-controlled data into the program.
| Source | Description |
|---|---|
input() |
Interactive user input |
sys.argv |
Command-line arguments |
os.getenv() |
Environment variable lookup |
os.environ.get() |
Environment variable lookup |
environ.get() |
Environment variable lookup |
| Source | Description |
|---|---|
request |
Entire Flask request object |
request.args |
URL query parameters |
request.form |
POST form data |
request.values |
Combined GET + POST values |
request.data |
Raw request body bytes |
request.json |
Parsed JSON body |
request.get_json() |
Parsed JSON body (method) |
request.headers |
HTTP request headers |
request.cookies |
HTTP cookies |
request.files |
Uploaded files |
request.view_args |
URL route parameters |
request.get_data() |
Raw body (method) |
| Source | Description |
|---|---|
request.GET |
URL query parameters |
request.POST |
POST form data |
request.COOKIES |
HTTP cookies |
request.FILES |
Uploaded files |
request.headers |
HTTP headers |
request.body |
Raw body bytes |
request.META |
Server and request metadata |
| Source | Description |
|---|---|
request.query_params |
URL query parameters |
request.path_params |
URL path parameters |
request.headers |
HTTP headers |
request.cookies |
HTTP cookies |
request.state |
Custom request state |
request.json() |
JSON body |
request.form() |
Form data |
request.body() |
Raw body |
| Source | Description |
|---|---|
json.loads() |
Parsed JSON (user-controlled string) |
ujson.loads() |
Fast JSON parser |
orjson.loads() |
Fast JSON parser |
xmltodict.parse() |
XML-to-dict parser |
cgi.FieldStorage |
Legacy CGI form input |
web.input() |
web.py input |
A sanitizer is a call that cleans or validates untrusted data. When tainted data passes through a sanitizer, PyAegis considers the output clean and will not report a finding downstream.
| Sanitizer | What it protects against |
|---|---|
html.escape() |
XSS / HTML injection |
markupsafe.escape() |
XSS / HTML injection |
bleach.clean() |
XSS / HTML injection |
django.utils.html.escape() |
XSS / HTML injection |
flask.escape() |
XSS / HTML injection |
xml.sax.saxutils.escape() |
XML injection |
os.path.abspath() |
Path traversal (partial) |
os.path.normpath() |
Path traversal (partial) |
pathlib.Path.resolve() |
Path traversal (partial) |
urllib.parse.urlparse() |
SSRF (partial) |
validators.url() |
SSRF / URL validation |
!!! note
Sanitizer detection is heuristic. PyAegis recognizes these specific call patterns. Custom sanitizer functions can be added to the sanitizers list in your rules YAML.
Severity: CRITICAL
Occurs when untrusted input is passed to a Python code execution function.
Sinks: eval, exec, compile, builtins.eval, builtins.exec, runpy.run_module, runpy.run_path
# VULNERABLE
from flask import request
def dangerous():
expr = request.args.get("expr") # source
result = eval(expr) # sink: code injection
return str(result)# SAFE — sanitizer breaks taint (note: eval on safe data still bad practice)
from flask import request
def process():
raw = request.args.get("n")
n = int(raw) # type conversion — taint is broken heuristically
return n * 2Severity: CRITICAL
Occurs when untrusted input is interpolated into a shell command or passed as a command argument.
Sinks: os.system, os.popen, os.spawn*, subprocess.call, subprocess.run, subprocess.Popen, subprocess.*, commands.getoutput
# VULNERABLE
import subprocess
from flask import request
def ping():
host = request.args.get("host") # source
subprocess.call(["ping", "-c", "1", host]) # sink: command injection# VULNERABLE (string interpolation)
import os
from flask import request
def run():
cmd = request.form.get("cmd") # source
os.system(f"run_tool {cmd}") # sink: injection via f-stringSeverity: CRITICAL
Deserializing attacker-controlled data with pickle, dill, marshal, or unsafe YAML loaders can lead to arbitrary code execution.
Sinks: pickle.loads, pickle.load, cPickle.loads, dill.loads, marshal.loads, yaml.load, yaml.unsafe_load, ruamel.yaml.load, jsonpickle.decode
# VULNERABLE
import pickle
from flask import request
def load_session():
data = request.cookies.get("session") # source
obj = pickle.loads(data.encode()) # sink: insecure deserialization
return obj!!! warning
yaml.load() without an explicit Loader=yaml.SafeLoader is dangerous and will be flagged. Use yaml.safe_load() instead.
Severity: HIGH
Occurs when user-controlled input determines the URL of an outbound HTTP request, allowing attackers to probe internal services.
Sinks: requests.get, requests.post, requests.request, httpx.get, httpx.post, httpx.request, urllib.request.urlopen, urllib3.PoolManager.request, urllib3.request, aiohttp.ClientSession.get, aiohttp.ClientSession.post, aiohttp.ClientSession.request, socket.create_connection
# VULNERABLE
import requests
from flask import request
def fetch():
url = request.args.get("url") # source
resp = requests.get(url) # sink: SSRF
return resp.textSeverity: HIGH
Occurs when user input controls a file path, allowing traversal outside the intended directory (../../etc/passwd).
Sinks: open, builtins.open, os.open, os.remove, os.unlink, os.rmdir, os.rename, os.replace, os.mkdir, os.makedirs, shutil.copy, shutil.copyfile, shutil.copytree, shutil.move, shutil.rmtree, pathlib.Path, pathlib.Path.open, pathlib.Path.write_text,
pathlib.Path.write_bytes, tempfile.NamedTemporaryFile
# VULNERABLE
from flask import request
def read_file():
filename = request.args.get("file") # source
with open(f"/var/data/{filename}") as f: # sink: path traversal
return f.read()# SAFE — os.path.normpath + abspath act as sanitizers
import os
from flask import request
def read_file_safe():
filename = request.args.get("file")
safe_path = os.path.abspath(os.path.normpath(filename)) # sanitizer
with open(safe_path) as f:
return f.read()Severity: CRITICAL
Occurs when user-controlled strings are concatenated into SQL queries without parameterization.
Sinks: sqlite3.connect, sqlite3.Connection.execute, sqlite3.Cursor.execute, sqlite3.Cursor.executemany, psycopg2.connect, psycopg2.cursor.execute, MySQLdb.connect, pymysql.connect, sqlalchemy.text
# VULNERABLE
import sqlite3
from flask import request
def search():
name = request.args.get("name") # source
conn = sqlite3.connect("app.db")
cur = conn.cursor()
cur.execute(f"SELECT * FROM users WHERE name='{name}'") # sink: SQL injection
return cur.fetchall()# SAFE — use parameterized queries
cur.execute("SELECT * FROM users WHERE name=?", (name,))Severity: CRITICAL
Occurs when user input is rendered as a template string, allowing attackers to execute arbitrary expressions in the template engine.
Sinks: jinja2.Template, jinja2.Environment.from_string, mako.template.Template
# VULNERABLE
from jinja2 import Template
from flask import request
def render():
tmpl = request.args.get("tmpl") # source
t = Template(tmpl) # sink: SSTI
return t.render()!!! danger SSTI in Jinja2 can escalate to full RCE. Always render with a fixed template and pass user data as context variables, never as the template string itself.
Severity: HIGH
Occurs when user-supplied XML is parsed with an XML library that expands external entities, potentially reading local files or triggering SSRF.
Sinks: xml.etree.ElementTree.parse, xml.etree.ElementTree.fromstring, lxml.etree.parse, lxml.etree.fromstring, xml.dom.minidom.parse, xml.dom.minidom.parseString
# VULNERABLE
from xml.etree import ElementTree as ET
from flask import request
def parse_xml():
data = request.get_data() # source
tree = ET.fromstring(data) # sink: XXE
return tree.find("name").textSeverity: MEDIUM
Occurs when user-controlled input is compiled as a regex pattern or matched against a complex pattern, potentially causing catastrophic backtracking.
Sinks: re.compile, re.match, re.search
# VULNERABLE
import re
from flask import request
def validate():
pattern = request.args.get("pattern") # source
if re.match(pattern, "test"): # sink: ReDoS
return "match"PyAegis propagates taint through the following expression types:
| Expression | Behaviour |
|---|---|
x = source() |
x becomes tainted |
y = x |
y becomes tainted if x is tainted |
z = f"{x} literal" |
z becomes tainted (f-string) |
z = x + " suffix" |
z becomes tainted (string concat) |
z = x % fmt |
z becomes tainted (%-format) |
z = [x, y] |
z becomes tainted if any element is tainted |
z = {"k": x} |
z becomes tainted if any value is tainted |
z = x.attr |
z becomes tainted if x is tainted |
z = x[key] |
z becomes tainted if x is tainted |
z += x |
z becomes tainted if x is tainted |
z = sanitizer(x) |
z is clean regardless of x |
z = local_fn(x) |
inter-procedural: z tainted if local_fn returns tainted given x |
To detect a custom sink or add a framework-specific source:
# custom_rules.yml
inputs:
- my_framework.get_user_input
- my_framework.Request.body
sinks:
- my_dangerous_exec
- my_framework.shell_run
- legacy_lib.*
sanitizers:
- my_project.utils.clean_html
- my_project.validators.validate_pathpyaegis ./src --rules custom_rules.ymlGlob patterns (*, ?, [seq]) follow Python's fnmatch module semantics.