Skip to content

Python: Modernize iter not returning self query #19554

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
ql/python/ql/src/Functions/IterReturnsNonSelf.ql
ql/python/ql/src/Functions/NonCls.ql
ql/python/ql/src/Functions/NonSelf.ql
ql/python/ql/src/Functions/ReturnConsistentTupleSizes.ql
Expand Down
27 changes: 10 additions & 17 deletions python/ql/src/Functions/IterReturnsNonSelf.qhelp
Original file line number Diff line number Diff line change
Expand Up @@ -3,34 +3,27 @@
"qhelp.dtd">
<qhelp>
<overview>
<p>The <code>__iter__</code> method of an iterator should return self.
This is important so that iterators can be used as sequences in any context
that expect a sequence. To do so requires that <code>__iter__</code> is
idempotent on iterators.</p>

<p>
Note that sequences and mapping should return a new iterator, it is just the returned
iterator that must obey this constraint.
<p>Iterator classes (classes defining a <code>__next__</code> method) should have an <code>__iter__</code> method that returns the iterator itself.
This ensures that the object is also an iterable; and behaves as expected when used anywhere an iterator or iterable is expected, such as in <code>for</code> loops.
</p>



</overview>
<recommendation>
<p>Make the <code>__iter__</code> return self unless the class should not be an iterator,
in which case rename the <code>next</code> (Python 2) or <code>__next__</code> (Python 3)
to something else.</p>
<p>Ensure that the <code>__iter__</code> method returns <code>self</code>, or is otherwise equivalent as an iterator to <code>self</code>.</p>

</recommendation>
<example>
<p>In this example the <code>Counter</code> class's <code>__iter__</code> method does not
return self (or even an iterator). This will cause the program to fail when anyone attempts
to use the iterator in a <code>for</code> loop or <code>in</code> statement.</p>
<sample src="IterReturnsNonSelf.py" />
<p>In the following example, the <code>MyRange</code> class's <code>__iter__</code> method does not return <code>self</code>.
This would lead to unexpected results when used with a <code>for</code> loop or <code>in</code> statement.</p>
<sample src="examples/IterReturnsNonSelf.py" />

</example>
<references>

<li>Python Language Reference: <a href="http://docs.python.org/2.7/reference/datamodel.html#object.__iter__">object.__iter__</a>.</li>
<li>Python Standard Library: <a href="http://docs.python.org/2/library/stdtypes.html#typeiter">Iterators</a>.</li>
<li>Python Language Reference: <a href="http://docs.python.org/3/reference/datamodel.html#object.__iter__">object.__iter__</a>.</li>
<li>Python Standard Library: <a href="http://docs.python.org/3/library/stdtypes.html#typeiter">Iterators</a>.</li>


</references>
Expand Down
78 changes: 69 additions & 9 deletions python/ql/src/Functions/IterReturnsNonSelf.ql
Original file line number Diff line number Diff line change
Expand Up @@ -4,27 +4,87 @@
* @kind problem
* @tags reliability
* correctness
* quality
* @problem.severity error
* @sub-severity low
* @precision high
* @id py/iter-returns-non-self
*/

import python
import semmle.python.ApiGraphs

Function iter_method(ClassValue t) { result = t.lookup("__iter__").(FunctionValue).getScope() }
/** Gets the __iter__ method of `c`. */
Function iterMethod(Class c) { result = c.getAMethod() and result.getName() = "__iter__" }

predicate is_self(Name value, Function f) { value.getVariable() = f.getArg(0).(Name).getVariable() }
/** Gets the `__next__` method of `c`. */
Function nextMethod(Class c) { result = c.getAMethod() and result.getName() = "__next__" }

predicate returns_non_self(Function f) {
/** Holds if `var` is a variable referring to the `self` parameter of `f`. */
predicate isSelfVar(Function f, Name var) { var.getVariable() = f.getArg(0).(Name).getVariable() }

/** Holds if `e` is an expression that an iter function `f` should return. */
predicate isGoodReturn(Function f, Expr e) {
isSelfVar(f, e)
or
exists(DataFlow::CallCfgNode call, DataFlow::AttrRead read, DataFlow::Node selfNode |
e = call.asExpr()
|
call = API::builtin("iter").getACall() and
call.getArg(0) = read and
read.accesses(selfNode, "__next__") and
isSelfVar(f, selfNode.asExpr()) and
call.getArg(1).asExpr() instanceof None
)
}

/** Holds if the iter method `f` does not return `self` or an equivalent. */
predicate returnsNonSelf(Function f) {
exists(f.getFallthroughNode())
or
exists(Return r | r.getScope() = f and not is_self(r.getValue(), f))
exists(Return r | r.getScope() = f and not isGoodReturn(f, r.getValue()))
}

/** Holds if `iter` and `next` methods are wrappers around some field. */
predicate iterWrapperMethods(Function iter, Function next) {
exists(string field |
exists(Return r, DataFlow::Node self, DataFlow::AttrRead read |
r.getScope() = iter and
r.getValue() = [iterCall(read).asExpr(), read.asExpr()] and
read.accesses(self, field) and
isSelfVar(iter, self.asExpr())
) and
exists(Return r, DataFlow::Node self, DataFlow::AttrRead read |
r.getScope() = next and
r.getValue() = nextCall(read).asExpr() and
read.accesses(self, field) and
isSelfVar(next, self.asExpr())
)
)
}

/** Gets a call to `iter(arg)` or `arg.__iter__()`. */
private DataFlow::CallCfgNode iterCall(DataFlow::Node arg) {
result.(DataFlow::MethodCallNode).calls(arg, "__iter__")
or
result = API::builtin("iter").getACall() and
arg = result.getArg(0) and
not exists(result.getArg(1))
}

/** Gets a call to `next(arg)` or `arg.__next__()`. */
private DataFlow::CallCfgNode nextCall(DataFlow::Node arg) {
result.(DataFlow::MethodCallNode).calls(arg, "__next__")
or
exists(Return r | r.getScope() = f and not exists(r.getValue()))
result = API::builtin("next").getACall() and
arg = result.getArg(0)
}

from ClassValue t, Function iter
where t.isIterator() and iter = iter_method(t) and returns_non_self(iter)
select t, "Class " + t.getName() + " is an iterator but its $@ method does not return 'self'.",
iter, iter.getName()
from Class c, Function iter, Function next
where
next = nextMethod(c) and
iter = iterMethod(c) and
returnsNonSelf(iter) and
not iterWrapperMethods(iter, next)
Comment on lines +85 to +88
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This where clause requires that there be both an __iter__ and a __next__ method present. What if only the __next__ method is present? It might make sense to also flag this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially useful. The alert for that case would have to be different (doesn't link to the __iter__ method) which is a little trickier to handle.

select iter, "Iter method of iterator $@ does not return `" + iter.getArg(0).getName() + "`.", c,
c.getName()
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@ def __init__(self, low, high):
self.high = high

def __iter__(self):
return self.current
return (self.current, self.high) # BAD: does not return `self`.

def next(self):
def __next__(self):
if self.current > self.high:
raise StopIteration
return None
self.current += 1
return self.current - 1
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
category: minorAnalysis
---
* The `py/iter-returns-non-self` query has been modernized, and no longer alerts for certain cases where an equivalent iterator is returned.

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
| test.py:5:5:5:23 | Function __iter__ | Iter method of iterator $@ does not return `self`. | test.py:1:1:1:11 | Class Bad1 | Bad1 |
| test.py:51:5:51:23 | Function __iter__ | Iter method of iterator $@ does not return `self`. | test.py:42:1:42:21 | Class FalsePositive1 | FalsePositive1 |
53 changes: 53 additions & 0 deletions python/ql/test/query-tests/Functions/iterators/test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
class Bad1:
def __next__(self):
return 0

def __iter__(self): # BAD: Iter does not return self
yield 0

class Good1:
def __next__(self):
return 0

def __iter__(self): # GOOD: iter returns self
return self

class Good2:
def __init__(self):
self._it = iter([0,0,0])

def __next__(self):
return next(self._it)

def __iter__(self): # GOOD: iter and next are wrappers around a field
return self._it.__iter__()

class Good3:
def __init__(self):
self._it = iter([0,0,0])

def __next__(self):
return self._it.__next__()

def __iter__(self): # GOOD: iter and next are wrappers around a field
return self._it

class Good4:
def __next__(self):
return 0

def __iter__(self): # GOOD: this is an equivalent iterator to `self`.
return iter(self.__next__, None)

class FalsePositive1:
def __init__(self):
self._it = None

def __next__(self):
if self._it is None:
self._it = iter(self)
return next(self._it)

def __iter__(self): # SPURIOUS, GOOD: implementation of next ensures the iterator is equivalent to the one returned by iter, but this is not detected.
yield 0
yield 0