Skip to content

Undirected Graph Algorithms

Alex edited this page Nov 8, 2013 · 2 revisions

Undirected Graphs

Vocabulary

A few definitions first, to make sense of things.

Definition. A "graph" consists of a set of "vertices" (or nodes) equipped with a set of "edges". We write G = (V, E) for a graph G which consists of a set of vertices V and edges E. An edge connects precisely two vertices.

If two vertices are connected by an edge, they are "adjacent" to each other.

The "degree" of a vertex is the number of edges connected to it.

A "subgraph" consists of a subset of a graph's edges (and associated vertices) which constitutes a graph. (End of definitions.)

Definition. Let G = (V, E) be a graph. A "path" is a sequence of vertices cnnected by edges. A "simple path" is one with no repeated vertices.

The "length" of a path is its number of edges.

A "cycle" is a path that starts and ends at the same node.

We say a graph is "connected" if there is a path from every vertex to every other vertex. A graph that is not connected consists of a set of "connected components" which are its maximal connected subgraphs. (End of definitions)

We can ask some interesting questions already, e.g., are two given vertices connected by a path? If so, what's the shortest path connecting them?

Hamilton problem: Can we construct a cycle which uses every vertex exactly once?

Euler problem: Can we construct a cycle which uses every edge exactly once?

Biconnectivity problem: is there any vertex such that, if we remove, then we increase the number of connected components?

API

Lets consider the following API for undirected graphs.

public class Graph {
    /* create a V-vertex graph with no edges */
    Graph(int V) 
    { ... }
    /* read a graph from input stream in */
    Graph(In in)
    { ... }
    /* number of vertices */
    int V()
    { ... }
    /* number of edges */
    int E()
    { ... }
    /* add edge v-w to this graph */
    void addEdge(int v, int w)
    { ... }
    /* vertices adjacent to v */
    Iterable<Integer> adj(int v)
    { ... }
    /* string representation */
    String to String()
    { ... }
}

We'll use it throughout this page.

Depth-first Search

Think of a maze. We can model it with a graph, where each intersection is represented by a vertex, and edges represent passages.

The goal: explore every intersection in the maze.

Trémaux Maze Exploration Algorithm

  1. Unroll a bag of string behind you.
  2. Mark each visited intersection and each visited passage.
  3. Retrace steps when no unvisited options.

Some argue this algorithm was first used when Theseus entered the Labyrinth to kill the Minotaur. Ariadne contrived the algorithm, to make sure Theseus wouldn't get lost.

Goal. Systematically search through a graph.

Idea. Mimic maze exploration

DepthFirstSearch (to visit a vertex v)

  1. Mark v as visited
  2. Recursively visit all unmarked vertices w adjacent to v

Typical applications. Find all vertices connected to a given source vertex. Or more generally, find a path between two vertices.

Challenge: How do we implement this?

Design Pattern. Decouple graph data type from the graph processing.

  1. Create a Graph object
  2. Pass the Graph object to a graph-processing routine.
  3. Query the graph-processing routine for information.
public class Paths {
    // find paths in G from source s
    Paths(Graph G, int s) 
    { /* ... */ }
    // is there a path from s to v?
    boolean hasPathTo(int v) 
    { /* ... */ }
    // path from s to v, null if no such path
    Iterable<Integer> pathTo(int v) 
    { /* ... */ }
    /* ... */
}

So an example usage:

Paths paths = new Paths(G, s);
for (int v = 0; v < G.V(); v++)
    if (paths.hasPathTo(v))
        StdOut.println(v); // print all vertices connected to s

We'll use two data structures:

  1. boolean[] marked to mark visited vertices
  2. int[] edgeTo keeps tree of paths, so (edgeTo[w] == v) means edge v-w taken to visit w for first time.

Implementation

Lets examine the implementation!

public class DepthFirstPaths
{
    // marked [v] = true if v connected to s
    private boolean[] marked;
    // edgeTo[v] = previous vertex on path from s to v
    private int[] edgeTo;
    private int s;
    
    public DepthFirstSearch(Graph G, int s)
    {
        ...        // initialize data structures
        dfs(G, s); // find vertices connected to s
    }
    
    // recursive DFS does the work
    private void dfs(Graph G, int v)
    {
        marked[v] = true;
        for (int w : G.adj(v))
            if (!marked[w])
            {
                dfs(G, w);
                edgeTo[w] = v;
            }
    }
}

Depth-First Search Algorithm Properties

Proposition. Depth First Search marks all vertices connected to s in time proportional to the sum of their degrees. (End of proposition)

Sketch of Proof for Correctness. Well, if w is marked, then w is connected to s.

If w is connected to s, then w is marked (if w is unmarked, then consider the last edge on a path from s to w that goes from a marked vertex to an unmarked one). (End of proof)

Sketch of Proof for Running time. Each vertex connected to s is visited once. So for each one of them, it goes through each of their adjacent vertices. (End of proof)

Proposition. After DFS, can find vertices connected to s in constant time and can find a path to s (if one exists) in time proportional to its length. (End of proposition)

Proof. Well, edgeTo[] is parent-link representation of a tree rooted at s. We see the code implementation

public boolean hasPathTo(int v)
{ return marked[v]; }

which is precisely a constant time algorithm. Then finding the path to s is implemented as

public Iterable<Integer> pathTo(int v)
{
    if (!hasPathTo(v)) return null;
    Stack<Integer> path = new Stack<Integer>();
    for (int x = v; x != s; x = edgeTo[x])
        path.push(x);
    path.push(s);
    return path;
}

where we see the for()-loop iterates once for each edge in the path, which gives us the second property. (End of proof)

Applications

Example 1: Dating.

A breadth-first search makes a lot of sense for dating in general, actually; it suggests dating a bunch of people casually before getting serious, rather than having a series of five-year relationships one after the other.

Example 2. Floop fill (photoshop magic wand).

Assumptions. Picture has billions of pixels.

Solutions. Build a grid graph, where vertex is a pixel, an edge connected two adjacent pixels of approximately the same color, and a blob are all pixels connected to a given pixel (i.e., a connected component).

Breadth-First Search

This is not a recursive algorithm. It uses a queue. We "mark" vertices, which we keep track of using an array boolean[] marked. We'll put the source vertex in the queue, then repeat the following steps:

  1. Remove vertex v from the queue
  2. Add to queue all unmarked vertices adjacent to v and mark them (i.e., set marked[w]=true for each w adjacent to v).

Note: If the queue is empty, every vertex has been examined.

Question: What happens if we use a stack instead of a queue?

Answer: We'd get depth-first search instead! (Awesome!)

Problem. Find a path from s to t that uses fewest number of edges.

The basic algorithm for breadth-first-search could be outlined as such:

BreadthFirstSearch (from source vertex s)

  1. Put s onto a FIFO queue, and mark s as visited
  2. Repeat the following steps until the queue is empty:
  3. Remove the least recently added vertex v ("dequeue a vertex")
  4. Add each (unvisited) neighbor of v to the queue, and mark them as visited.

The intuition is that Breadth-first examines vertices in increasing-distance from s.

Properties

Proposition. BFS computes shortest paths (fewest number of edges) from s to all other vertices in a graph in time proportional to E+V.

Proof (Correctness). Queue always consists of zero or more vertices of distance k from s, followed by zero or more vertices of distance k+1. This is just by the FIFO-structure of a queue. So it's ordered by distance. (End of proof)

Proof (Running time). We visit vertices once, so it's proportional to the number of vertices and the number of edges. (End of proof)

Implementation

The implementation is quite simple, "follow-your-nose", and clean.

public class BreadthFirstPaths
{
    private boolean[] marked;
    private int[] edgeTo;
    private final int s;
    /* ... */
    
    private void bfs(Graph G, int s)
    {
        Queue<Integer> q = new Queue<Integer>();
        q.enqueue(s);
        marked[s] = true;
        while (!q.isEmpty())
        {
            int v = q.dequeue();
            for (int w : G.adj(v))
            {
                if (!marked[w])
                {
                    q.enqueue(w);
                    marked[w] = true;
                    edgeTo[w] = v;
                }
            }
        }
    }
}

Applications

Example 1: Routing. When communicating over a network, you want to communicate with the "fewest number of hops". Breadth first search would give you the shortest number of nodes between the source and the destination.

Example 2: Kevin Bacon Number. We can solve the Six Degrees of Kevin Bacon "problem" using Breadth-First search.

Example 2.1: Erdös Number. Set up a graph. The nodes are mathematicians. When two mathematicians co-author a paper, we setup an edge between them. What's the shortest distance between a given mathematician and Erdös? This is the given mathematician's "Erdös number". We can solve it using Breadth-First search!

Really, though, it's just a variant of the Kevin Bacon problem (alas, Erdös is just a variant of Kevin Bacon!).

Connected Components

This is a different problem, but it's very useful in daily life (trust me!).

Definition. Vertices v and w are "Connected" if there is a path between them.

Goal. Preprocess a graph to answer questions of the form "Is v connected to w?" in constant time.

We'd like to do this for a huge, sparse graph...which in practice the adjacency list implementation cannot easily handle.

So we implement our own data structure. The skeleton of the class looks like:

public class CC 
{
    public CC(Graph G) // find connected components in G
    { ... }
    
    boolean connected(int v, int w) // are v and w connected?
    { ... }
    
    int count() // number of connected components
    { ... }
    
    int id(int v) // component identifier for v
    { ... }
}

Is this union-find? Well, not quite...we couldn't do union-find in constant time (alas!). But it is depth-first search!

Connectedness Relation

The relation "is connected to" is an equivalence relation, meaning we have the following properties:

  1. Reflexive: v is connected to v
  2. Symmetric: if v is connected to w, then w is connected to v
  3. Transitive: if v is connected to w, and w is connected to x, then v is connected to x.

Definition. A connected component is a maximal set of connected vertices. (End of definition)

So we need to have some lookup int[] lookup where lookup[v] == lookup[w] iff w is connected to v. This would give us a constant time solution.

Goal: Partition vertices into connected components.

Algorithm.

  1. Initialize all vertices v as unmarked
  2. For each unmarked vertex v, run DFS to identify all vertices discovered as part of the same component.

Implementation

The implementation is relatively straightforward, but some comments are necessary.

public class CC
{
    private boolean marked[]; 
    private int[] id; // id[v] = id of component containing v
    private int count; // number of components
    
    public CC(Graph G)
    {
        marked = new boolean[G.V()];
        // run DFS from one vertex in each component
        id = new int[G.V()];
        for (int v = 0; v < G.V(); v++)
        {
            if (!marked[v])
            {
                dfs(G, v);
                count++;
            }
        }
    }
    
    public int count()
    { return count; }
    
    public int id(int v)
    {  return id[v]; }
    
    private void dfs(Graph G, int v)
    {
        marked[v] = true;
        id[v] = count;          // the id is the component-number
        for (int w : G.adj(v))
            if (!marked[w])
                dfs(G, w);
    }
}

Challenges

Lets consider different problems with graph processing, and try to develop intuitions for what problems are easy and which are difficult.

Example

Definition. A graph is "bipartite" if we can divide the vertices into two subsets, say W and X, such that each edge connects a vertex in W to a vertex in X. (End of definition)

So, no edge connects a vertex in X to another vertex in X. The subgraph consisting of vertices X is totally disconnected. And likewise the subgraph consisting of vertices W is totally disconnected.

Problem: Is a graph bipartite?

There is a simple Depth-First Search solution.

Example: Find a Cycle

Problem: Does a given graph contain a cycle or not?

Again, a depth-first search solution may be constructed.

Seven Bridges of Konigsberg

Euler posed this problem in 1736:

... in Konigsberg in Prussia, there is an island A, called the Kneiphof; the river wihch surrounds it is divided into two branches...and these branches are crossed by seven bridges. Concerning these bridges, it was asked whether anyone could arrange a route in such a way that he could cross each bridge once and only once.

Each bridge is represented by an edge, the island and banks are vertices.

So the graph representation of this might be given by the graphviz code:

graph {
    C -- A;
    A -- C;
    {rank=same; A; D;}
    A -- B;
    B -- A;
    C -- D;
    B -- D;
}

Euler tour. Is there a (general) cycle that uses each edge exactly once?

Answer: Yes! But only iff the graph is connected and all vertices have an even degree.

But the problem is finding one explicitly.

Example: Traveling Salesman

Problem: What if we want to find a cycle which visits each vertex exactly once?

This problem is intractable. It's an NP-complete problem. No one knows an efficient solution for this problem for large graphs.

Example: Graph Isomorphism

Problem: Given two graphs, are they identical except for vertex names?

Well, no one knows how to classify this problem! It may be easy, or unsolveable. We can't show it either way.

References

  • David Eppstein's Lecture notes for February 15, 1996 for UC Irvine's ICS 161: Design and Analysis of Algorithms.
  • Kevin Wayne and Robert Sedgewick's Algorithms. Fourth ed. See chapter 4, section 1.
  • CLRS Algorithms. Third ed. See sections 22.2 (Breadth-first search) and 22.3 (Depth-first search)
Clone this wiki locally