You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I just read an interesting article on how bad actors can evade text-based static analysis tools using Unicode. Ever since PEP 3131, Python allowed programmers to use non-ASCII characters to allow developers "to define classes and functions with names in their native languages". As a consequence, there are now many ways keywords like eval be specified. (See: https://lingojam.com/BoldTextGenerator)
Proposal
Guarddog could preprocess all source files by converting any Unicode to ASCII. According to the PEP, "All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC."
Alternatively, Guarddog could define a new heuristic that warns if non-ASCII characters are found.
Test
Generate a bolded Unicode variant of the letter e to obtain 𝐞
Append tests/analyzer/sourcecode/code-execution.py with the following code:
I've seen this be solved a few ways, one of them being what you suggest. The preprocessing/replacement part can be tricky as it could break functionality if you incorrectly replace a piece of unicode.
Problem
Hi! I just read an interesting article on how bad actors can evade text-based static analysis tools using Unicode. Ever since PEP 3131, Python allowed programmers to use non-ASCII characters to allow developers "to define classes and functions with names in their native languages". As a consequence, there are now many ways keywords like
eval
be specified. (See: https://lingojam.com/BoldTextGenerator)Proposal
Guarddog could preprocess all source files by converting any Unicode to ASCII. According to the PEP, "All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC."
Alternatively, Guarddog could define a new heuristic that warns if non-ASCII characters are found.
Test
e
to obtain𝐞
tests/analyzer/sourcecode/code-execution.py
with the following code:semgrep --metrics off --test --config guarddog/analyzer/sourcecode tests/analyzer/sourcecode
The text was updated successfully, but these errors were encountered: