-
Notifications
You must be signed in to change notification settings - Fork 0
Undirected Graph Algorithms
A few definitions first, to make sense of things.
Definition. A "graph" consists of a set of "vertices" (or nodes) equipped with a set of "edges". We write G = (V, E) for a graph G which consists of a set of vertices V and edges E. An edge connects precisely two vertices.
If two vertices are connected by an edge, they are "adjacent" to each other.
The "degree" of a vertex is the number of edges connected to it.
A "subgraph" consists of a subset of a graph's edges (and associated vertices) which constitutes a graph. (End of definitions.)
Definition. Let G = (V, E) be a graph. A "path" is a sequence of vertices cnnected by edges. A "simple path" is one with no repeated vertices.
The "length" of a path is its number of edges.
A "cycle" is a path that starts and ends at the same node.
We say a graph is "connected" if there is a path from every vertex to every other vertex. A graph that is not connected consists of a set of "connected components" which are its maximal connected subgraphs. (End of definitions)
We can ask some interesting questions already, e.g., are two given vertices connected by a path? If so, what's the shortest path connecting them?
Hamilton problem: Can we construct a cycle which uses every vertex exactly once?
Euler problem: Can we construct a cycle which uses every edge exactly once?
Biconnectivity problem: is there any vertex such that, if we remove, then we increase the number of connected components?
Lets consider the following API for undirected graphs.
public class Graph {
/* create a V-vertex graph with no edges */
Graph(int V)
{ ... }
/* read a graph from input stream in */
Graph(In in)
{ ... }
/* number of vertices */
int V()
{ ... }
/* number of edges */
int E()
{ ... }
/* add edge v-w to this graph */
void addEdge(int v, int w)
{ ... }
/* vertices adjacent to v */
Iterable<Integer> adj(int v)
{ ... }
/* string representation */
String to String()
{ ... }
}
We'll use it throughout this page.
Think of a maze. We can model it with a graph, where each intersection is represented by a vertex, and edges represent passages.
The goal: explore every intersection in the maze.
Trémaux Maze Exploration Algorithm
- Unroll a bag of string behind you.
- Mark each visited intersection and each visited passage.
- Retrace steps when no unvisited options.
Some argue this algorithm was first used when Theseus entered the Labyrinth to kill the Minotaur. Ariadne contrived the algorithm, to make sure Theseus wouldn't get lost.
Goal. Systematically search through a graph.
Idea. Mimic maze exploration
DepthFirstSearch
(to visit a vertex v
)
- Mark
v
as visited - Recursively visit all unmarked vertices
w
adjacent tov
Typical applications. Find all vertices connected to a given source vertex. Or more generally, find a path between two vertices.
Challenge: How do we implement this?
Design Pattern. Decouple graph data type from the graph processing.
- Create a
Graph
object - Pass the
Graph
object to a graph-processing routine. - Query the graph-processing routine for information.
public class Paths {
// find paths in G from source s
Paths(Graph G, int s)
{ /* ... */ }
// is there a path from s to v?
boolean hasPathTo(int v)
{ /* ... */ }
// path from s to v, null if no such path
Iterable<Integer> pathTo(int v)
{ /* ... */ }
/* ... */
}
So an example usage:
Paths paths = new Paths(G, s);
for (int v = 0; v < G.V(); v++)
if (paths.hasPathTo(v))
StdOut.println(v); // print all vertices connected to s
We'll use two data structures:
-
boolean[] marked
to mark visited vertices -
int[] edgeTo
keeps tree of paths, so(edgeTo[w] == v)
means edgev-w
taken to visitw
for first time.
Lets examine the implementation!
public class DepthFirstPaths
{
// marked [v] = true if v connected to s
private boolean[] marked;
// edgeTo[v] = previous vertex on path from s to v
private int[] edgeTo;
private int s;
public DepthFirstSearch(Graph G, int s)
{
... // initialize data structures
dfs(G, s); // find vertices connected to s
}
// recursive DFS does the work
private void dfs(Graph G, int v)
{
marked[v] = true;
for (int w : G.adj(v))
if (!marked[w])
{
dfs(G, w);
edgeTo[w] = v;
}
}
}
Proposition. Depth First Search marks all vertices connected to s
in time proportional to the sum of their degrees. (End of proposition)
Sketch of Proof for Correctness. Well, if w
is marked, then w
is
connected to s
.
If w
is connected to s
, then w
is marked (if w
is unmarked, then
consider the last edge on a path from s
to w
that goes from a marked
vertex to an unmarked one). (End of proof)
Sketch of Proof for Running time.
Each vertex connected to s
is visited once. So for each one of them,
it goes through each of their adjacent vertices. (End of proof)
Proposition. After DFS, can find vertices connected to s
in
constant time and can find a path to s
(if one exists) in time
proportional to its length. (End of proposition)
Proof. Well, edgeTo[]
is parent-link representation of a tree rooted
at s
. We see the code implementation
public boolean hasPathTo(int v)
{ return marked[v]; }
which is precisely a constant time algorithm. Then finding the path to
s
is implemented as
public Iterable<Integer> pathTo(int v)
{
if (!hasPathTo(v)) return null;
Stack<Integer> path = new Stack<Integer>();
for (int x = v; x != s; x = edgeTo[x])
path.push(x);
path.push(s);
return path;
}
where we see the for()
-loop iterates once for each edge in the path,
which gives us the second property. (End of proof)
Example 1: Dating.
Example 2. Floop fill (photoshop magic wand).
Assumptions. Picture has billions of pixels.
Solutions. Build a grid graph, where vertex is a pixel, an edge connected two adjacent pixels of approximately the same color, and a blob are all pixels connected to a given pixel (i.e., a connected component).
This is not a recursive algorithm. It uses a queue. We "mark" vertices,
which we keep track of using an array boolean[] marked
. We'll put the
source vertex in the queue, then repeat the following steps:
- Remove vertex
v
from the queue - Add to queue all unmarked vertices adjacent to
v
and mark them (i.e., setmarked[w]=true
for eachw
adjacent tov
).
Note: If the queue is empty, every vertex has been examined.
Question: What happens if we use a stack instead of a queue?
Answer: We'd get depth-first search instead! (Awesome!)
Problem. Find a path from s
to t
that uses fewest number of edges.
The basic algorithm for breadth-first-search could be outlined as such:
BreadthFirstSearch
(from source vertex s
)
- Put
s
onto a FIFO queue, and marks
as visited - Repeat the following steps until the queue is empty:
- Remove the least recently added vertex
v
("dequeue a vertex") - Add each (unvisited) neighbor of
v
to the queue, and mark them as visited.
The intuition is that Breadth-first examines vertices in
increasing-distance from s
.
Proposition. BFS computes shortest paths (fewest number of edges)
from s
to all other vertices in a graph in time proportional to
E+V
.
Proof (Correctness). Queue always consists of zero or more vertices of
distance k
from s
, followed by zero or more vertices of distance
k+1
. This is just by the FIFO-structure of a queue. So it's ordered by
distance.
(End of proof)
Proof (Running time). We visit vertices once, so it's proportional to the number of vertices and the number of edges. (End of proof)
The implementation is quite simple, "follow-your-nose", and clean.
public class BreadthFirstPaths
{
private boolean[] marked;
private int[] edgeTo;
private final int s;
/* ... */
private void bfs(Graph G, int s)
{
Queue<Integer> q = new Queue<Integer>();
q.enqueue(s);
marked[s] = true;
while (!q.isEmpty())
{
int v = q.dequeue();
for (int w : G.adj(v))
{
if (!marked[w])
{
q.enqueue(w);
marked[w] = true;
edgeTo[w] = v;
}
}
}
}
}
Example 1: Routing. When communicating over a network, you want to communicate with the "fewest number of hops". Breadth first search would give you the shortest number of nodes between the source and the destination.
Example 2: Kevin Bacon Number. We can solve the Six Degrees of Kevin Bacon "problem" using Breadth-First search.
Example 2.1: Erdös Number. Set up a graph. The nodes are mathematicians. When two mathematicians co-author a paper, we setup an edge between them. What's the shortest distance between a given mathematician and Erdös? This is the given mathematician's "Erdös number". We can solve it using Breadth-First search!
Really, though, it's just a variant of the Kevin Bacon problem (alas, Erdös is just a variant of Kevin Bacon!).
This is a different problem, but it's very useful in daily life (trust me!).
Definition. Vertices v
and w
are "Connected" if there is a
path between them.
Goal. Preprocess a graph to answer questions of the form "Is v
connected to w
?" in constant time.
We'd like to do this for a huge, sparse graph...which in practice the adjacency list implementation cannot easily handle.
So we implement our own data structure. The skeleton of the class looks like:
public class CC
{
public CC(Graph G) // find connected components in G
{ ... }
boolean connected(int v, int w) // are v and w connected?
{ ... }
int count() // number of connected components
{ ... }
int id(int v) // component identifier for v
{ ... }
}
Is this union-find? Well, not quite...we couldn't do union-find in constant time (alas!). But it is depth-first search!
The relation "is connected to" is an equivalence relation, meaning we have the following properties:
-
Reflexive:
v
is connected tov
-
Symmetric: if
v
is connected tow
, thenw
is connected tov
-
Transitive: if
v
is connected tow
, andw
is connected tox
, thenv
is connected tox
.
Definition. A connected component is a maximal set of connected vertices. (End of definition)
So we need to have some lookup int[] lookup
where lookup[v] == lookup[w]
iff w
is connected to v
. This would give us a constant
time solution.
Goal: Partition vertices into connected components.
Algorithm.
- Initialize all vertices
v
as unmarked - For each unmarked vertex
v
, runDFS
to identify all vertices discovered as part of the same component.
The implementation is relatively straightforward, but some comments are necessary.
public class CC
{
private boolean marked[];
private int[] id; // id[v] = id of component containing v
private int count; // number of components
public CC(Graph G)
{
marked = new boolean[G.V()];
// run DFS from one vertex in each component
id = new int[G.V()];
for (int v = 0; v < G.V(); v++)
{
if (!marked[v])
{
dfs(G, v);
count++;
}
}
}
public int count()
{ return count; }
public int id(int v)
{ return id[v]; }
private void dfs(Graph G, int v)
{
marked[v] = true;
id[v] = count; // the id is the component-number
for (int w : G.adj(v))
if (!marked[w])
dfs(G, w);
}
}
Lets consider different problems with graph processing, and try to develop intuitions for what problems are easy and which are difficult.
Definition. A graph is "bipartite" if we can divide the vertices
into two subsets, say W
and X
, such that each edge connects a vertex
in W
to a vertex in X
. (End of definition)
So, no edge connects a vertex in X
to another vertex in X
. The
subgraph consisting of vertices X
is totally disconnected. And
likewise the subgraph consisting of vertices W
is totally
disconnected.
Problem: Is a graph bipartite?
There is a simple Depth-First Search solution.
Problem: Does a given graph contain a cycle or not?
Again, a depth-first search solution may be constructed.
Euler posed this problem in 1736:
... in Konigsberg in Prussia, there is an island A, called the Kneiphof; the river wihch surrounds it is divided into two branches...and these branches are crossed by seven bridges. Concerning these bridges, it was asked whether anyone could arrange a route in such a way that he could cross each bridge once and only once.
Each bridge is represented by an edge, the island and banks are vertices.
So the graph representation of this might be given by the graphviz code:
graph {
C -- A;
A -- C;
{rank=same; A; D;}
A -- B;
B -- A;
C -- D;
B -- D;
}
Euler tour. Is there a (general) cycle that uses each edge exactly once?
Answer: Yes! But only iff the graph is connected and all vertices have an even degree.
But the problem is finding one explicitly.
Problem: What if we want to find a cycle which visits each vertex exactly once?
This problem is intractable. It's an NP-complete problem. No one knows an efficient solution for this problem for large graphs.
Problem: Given two graphs, are they identical except for vertex names?
Well, no one knows how to classify this problem! It may be easy, or unsolveable. We can't show it either way.
- David Eppstein's Lecture notes for February 15, 1996 for UC Irvine's ICS 161: Design and Analysis of Algorithms.
- Kevin Wayne and Robert Sedgewick's Algorithms. Fourth ed. See chapter 4, section 1.
- CLRS Algorithms. Third ed. See sections 22.2 (Breadth-first search) and 22.3 (Depth-first search)