In this chapter the notions of recursion in the context of the SQL language are introduced. Graded assignments include one (1) quiz set, for which answers are provided below.
The quiz is really short and tests really basic stuff about how basic notions of recursion work. Bear in mind that recursion is hard both to grasp and use effectively. If you think you don't get how something recursive works, my personal advice is to create a mock up example and solve it step by step.
Let's consider a table T(A)
containing a set of positive integers with no duplicates,
and the following recursive SQL query. Note that this query includes nonlinear recursion,
which technically is not permitted in the strict SQL standard.
With Recursive Mystery(X,Y) As
(Select A As X, A As Y From T
Union
Select m1.X, m2.Y
From Mystery m1, Mystery m2
Where m2.X = m1.Y + 1)
Select Max(X-Y) + 1 From Mystery
While the definition looks complicated, the query in fact computes a property of T
that can be
stated very succinctly. First try to determine what Mystery is computing from T
. Then choose
which of the following is a correct statement about the final query result.
In my instance I had the following options to choose from:
- A: If
T
= {1, 5, 9, 10, 11, 12, 15} then the query returns 3. - B: If
T
= {1, 3, 4, 5, 10, 11, 12} then the query returns 2. - C: If
T
= {2, 4, 5, 6, 8, 10, 11} then the query returns 3. - D: If
T
= {1, 5, 9, 10, 12, 15} then the query returns 6.
Now this is tricky because of two facts. First one being recursion itself, but more
importantly the operation in Max
is wrong... there must have been a typo there as the
answers back this evidence as you'll see in a bit. In the original text the operation
inside Max
was Max(Y-X)
but it reality it should be Max(X-Y)
, which gives the
correct result (or an abs
would fix this... but that's not the point).
What this recursion calculates is the max distance between two numbers incremented by
one (in order to include also the starting point). Let's see how this might go, first
here is out T
table:
T |
---|
2 |
4 |
5 |
6 |
8 |
10 |
11 |
Firstly here is the initial state of our recursion on round 1:
M.X | M.Y |
---|---|
2 | 2 |
4 | 4 |
5 | 5 |
6 | 6 |
8 | 8 |
10 | 10 |
11 | 11 |
Now let's construct the M1
and M2
tables for this round, which is shown below.
M1.X | M1.Y | M2.X | M2.Y |
---|---|---|---|
2 | 2 | 2 | 2 |
4 | 4 | 4 | 4 |
5 | 5 | 5 | 5 |
6 | 6 | 6 | 6 |
8 | 8 | 8 | 8 |
10 | 10 | 10 | 10 |
11 | 11 | 11 | 11 |
Now recall that our join condition is where m2.X = m1.Y + 1
which are essentially
the following tuples:
M1.X | M1.Y | M2.X | M2.Y |
---|---|---|---|
5 | 5 | 4 | 4 |
6 | 6 | 5 | 5 |
11 | 11 | 10 | 10 |
From them we have to take M1.X
and M2.Y
which are added back to our Mystrery
table,
so in the initial step of round 2 of the recursion Mystery
table has the following
tuples:
M.X | M.Y |
---|---|
2 | 2 |
4 | 4 |
5 | 5 |
6 | 6 |
8 | 8 |
10 | 10 |
11 | 11 |
---- | ---- |
5 | 4 |
6 | 5 |
11 | 10 |
Moving on we select the tuples, which are matched; these are the following:
M1.X | M1.Y | M2.X | M2.Y |
---|---|---|---|
6 | 5 | 5 | 4 |
So after this step our Mystery
table has the following tuples:
M.X | M.Y |
---|---|
2 | 2 |
4 | 4 |
5 | 5 |
6 | 6 |
8 | 8 |
10 | 10 |
11 | 11 |
5 | 4 |
6 | 5 |
11 | 10 |
---- | ---- |
6 | 4 |
We can easily see (I'll leave that as an exercise for you) that there are no new tuples
added after we reach this state. Taking the X-Y
operation yields and then the Max
yields
2
, which is when incremented by 1
to give the final result of 3
.
Hence the correct answer is C, e.g. when T
= {2, 4, 5, 6, 8, 10, 11}
then the query returns 3.
Consider a relation Manager(manager,employee) where a tuple (m,e)
in Manager specifies that
person m
is the manager of person e
. The only key for Manager is both attributes together.
The following recursive SQL query computes the relation Peer(X,Y)
.
With Recursive Peer(X,Y) As
(Select M1.employee, M2.employee
From Manager M1, Manager M2
Where M1.manager = M2.manager AND M1.employee < M2.employee
Union
Select M1.employee, M2.employee
From Peer S, Manager M1, Manager M2
Where S.X = M1.manager AND S.Y = M2.manager
And M1.employee < M2.employee)
Select * from Peer
Given the above statement, suppose the Manager has the following snapshot:
manager | employee |
---|---|
10 | 9 |
10 | 8 |
9 | 7 |
9 | 6 |
8 | 6 |
8 | 5 |
7 | 4 |
7 | 3 |
6 | 3 |
6 | 2 |
5 | 2 |
5 | 1 |
Consider the computation of Peer in the recursive query. Let the base case -- the first term of the Union -- be "round 1." Let each subsequent round of the recursion be "round 2," "round 3," and so on. Which of the following is a correct statement about when a pair gets added to Peer?
Hint: You may find it helpful to draw a figure that shows the Manager relationships.
In my instance I had the following options to choose from:
- A:
(3,5)
is added in round 2. - B:
(1,4)
is added in round 2. - C:
(1,4)
is added in round 3. - D:
(6,7)
is added in round 2.
In order to better understand how the recursion advances we will look at the manager hierarchy, which is illustrated in the following figure (you can also download a vector version of this here).
Now, observe the that fact that in each step the recursion descends one level of the manager
tree and locates the peer tuples which are added to the final result. Thus in iteration level
1 we have the employees that have the same manager marked as peers; thus the initial
seed Peer
table for our recursion is the following:
Peer1
:
X | Y |
---|---|
9 | 8 |
7 | 6 |
6 | 5 |
4 | 3 |
3 | 2 |
2 | 1 |
Intuitively, as the recursion in SQL only stops when a fixed point is reach (e.g. no new tuples are added to the recursive relation) we have to show the steps that this is achieved and the correct answer will probably show along the way as well.
Round 2 of the recursion has the following table:
Peer2
:
X | Y |
---|---|
9 | 8 |
7 | 6 |
6 | 5 |
4 | 3 |
3 | 2 |
2 | 1 |
--- | --- |
5 | 7 |
5 | 7 |
6 | 7 |
We see that there is only three tuples added in this round, these pairs indicate that these employees are peers. Now let's move onto round three of the recursion which has the following table:
Peer3
:
X | Y |
---|---|
9 | 8 |
7 | 6 |
6 | 5 |
4 | 3 |
3 | 2 |
2 | 1 |
6 | 7 |
5 | 7 |
5 | 6 |
--- | --- |
1 | 2 |
1 | 3 |
1 | 4 |
2 | 3 |
2 | 4 |
3 | 4 |
We see a lot more tuples that are added now; this is due to the fan out of the hierarchy tree
as we go down, but the principle is the same. Now take a look at our options and in this
step we will see that the tuple (1, 4)
is added in this round. Hence it also the
correct answer, which is option C.
For the final question let's consider a relation Parent(par,child)
, where a tuple (p,c)
in Parent specifies that person p
is the parent of person c
. The only key for Parent
is both attributes together. We are interested in writing a recursive SQL query to find
all descendants of the person named "Eve." Here are six possible definitions of a
recursive relation Ancestor(X,Y)
. Note that some of the definitions include nonlinear
recursion, which technically is not permitted in the strict SQL standard.
1: With Recursive Ancestor(X,Y) As
(Select par, child From Parent
Union
Select Ancestor.X, Parent.child
From Ancestor, Parent
Where Ancestor.Y = Parent.par)
2: With Recursive Ancestor(X,Y) As
(Select par, child From Parent Where par = 'Eve'
Union
Select Ancestor.X, Parent.child
From Ancestor, Parent
Where Ancestor.Y = Parent.par)
3: With Recursive Ancestor(X,Y) As
(Select par, child From Parent
Union
Select Parent.par, Ancestor.Y
From Parent, Ancestor
Where Parent.child = Ancestor.X)
4: With Recursive Ancestor(X,Y) As
(Select par, child From Parent
Union
Select Parent.par, Ancestor.Y
From Parent, Ancestor
Where Parent.child = Ancestor.X and Parent.par = 'Eve')
5: With Recursive Ancestor(X,Y) As
(Select par, child From Parent Where par = 'Eve'
Union
Select A1.X, A2.Y
From Ancestor A1, Ancestor A2
Where A1.Y = A2.X)
6: With Recursive Ancestor(X,Y) As
(Select par, child From Parent
Union
Select A1.X, A2.Y
From Ancestor A1, Ancestor A2
Where A1.Y = A2.X and A1.X = 'Eve')
Consider two possible queries that can be used to complete any of the WITH
statements (1)-(6):
A: Select Y From Ancestor
B: Select Y From Ancestor Where X = 'Eve'
Which of the following combinations correctly computes the descendants of "Eve"?
In my instance I had the following options to choose from:
- A: (B) gives the correct answer with (1), (2), and (3) only.
- B: The only choices for which both (A) and (B) both give the correct answer are (1) and (2).
- C: (B) gives the correct answer with (1), (2), (3), and (6) only.
- D: Neither (A) nor (B) give the correct answer with (4) and (6).
This can be easily answers from the lectures as this is the query that's used in them. Given the above options the correct answer is C, (B) gives the correct answer with (1), (2), (3), and (6) only. Moreover, the only query that gives the correct answer whether we use (A) or (B) is (2).
I won't talk why these give the correct answer, as it's really straightforward but I'll
briefly talk why (4) and (5) don't give the correct answer when using (B) or (A). Now on (4)
observe that inside where
we filter the parents using Eve
and we select Parent.Par
to be our X
column from Ancestor
and the Y
column to be Ancestor.Y
. When done,
that would only give us the result until the grandchildren of Eve
. Concretely, consider
this example:
Let parent
have the following snapshot:
p | c |
---|---|
Eve | John |
Eve | Adam |
Adam | Bruce |
Adam | Coco |
Coco | Jo |
Jo | Doe |
Now in the initial relation for the recursion seed we would have:
Ancestor
on round 1
X | Y |
---|---|
Eve | John |
Eve | Adam |
Adam | Bruce |
Adam | Coco |
Coco | Jo |
Jo | Doe |
While parent would always have:
Parent
after filtering
p | c |
---|---|
Eve | John |
Eve | Adam |
Now moving onto round two of the recursion we have the following snapshot for Ancestor
:
Ancestor
on round 2
X | Y |
---|---|
Eve | John |
Eve | Adam |
Adam | Bruce |
Adam | Coco |
Coco | Jo |
Jo | Doe |
---- | ----- |
Eve | Bruce |
Eve | Coco |
And this would stop there, as when joining with Parent
no new descendants
Now moving onto (5), we can clearly see why this is not correct as we filter out the tuples
from Parent
in the initial relation while in the recursion we are self-joining with the
Ancestor
relation which does not have all the information required. Concretely, consider
this example:
Let Parent
have the following snapshot:
Parent
:
p | c |
---|---|
Eve | John |
Eve | Adam |
Adam | Bruce |
Adam | Coco |
Now in the initial relation for the recursion seed we would have:
Ancestor
X | Y |
---|---|
Eve | John |
Eve | Adam |
But even when we self-join we would never include the tuples (Adam, Brunce)
and (Adam, Coco)
as that information has been filtered out already. Thus,
this method is inherently wrong.