DS&A

Data Structures and Algorithms

Depth-First Search

Apr 092018

Search problems vary in two ways: what do we want to search for, and what type of collection are we searching in?(1)

When we need a search algorithm, the data structure we want to search is more important than what we want to find. We could even say that linear search and binary search work on different data structures: linear search uses a list, and binary search uses a list which must be kept in order.(2)

To search for something different, we can often just use an existing algorithm with a few small adjustments — consider how easy it is to change the linear search algorithm to search for different things in a list.

On the other hand, linear search fundamentally relies on the list’s iterate operation. If a data structure doesn’t let us iterate over the whole collection, then we can’t just tweak linear search. For example, the list-of-children tree data structure allows us to iterate over a node’s children, but not the entire tree.

Before we get started, let’s specify a search problem:

  • Problem: Given a tree as a list-of-children data structure, find the node with a target value.(3)
    • Input: the root node.(4)
    • Input: a target value.
    • Output: the node with the target value, or None if there is no such node in the tree.
  • Example: In the tree below, the node with the target value 8 is highlighted.

The Python code for the example tree is given below:

class Node:
    def __init__(self, value, children=None):
        # a leaf node has an empty list of children
        if children is None:
            children = []
        self.value = value
        self.children = children

root = Node(1, [
    Node(2, [
        Node(3, [
            Node(4),
            Node(5)
        ]),
        Node(6)
    ]),
    Node(7, [
        Node(8, [
            Node(9)
        ])
    ]),
    Node(10)
])

In this post we’ll see the simplest algorithm which solves this problem; it can be implemented with or without recursion.

Recursive depth-first search

To solve any problem on a tree, the first idea is to use structural recursion. We can design a recursive algorithm by asking two questions: what is the recursive structure, and why would it help to make recursive calls?

A list-of-children tree is a recursive structure because a Node object has a list of children, which are themselves Node objects. If the root node defines the tree, then each child defines a subtree.

If there is a node with the target value, it’s either the root node, or it’s in one of the subtrees. If it’s the root node, we return the node itself; otherwise, if it’s in a subtree, then we can find it with a recursive call.

In this example, the target is in the second subtree. First, the root node does not have the target value 8, then the target is not found by the recursive call on the first subtree. Then the second recursive call finds the correct node, so the function returns it.(5)

Here’s the algorithm in Python:

def depth_first_search(node, target):
    # check the root node first
    if node.value == target:
        return node
    
    # search recursively in the child subtrees
    for child in node.children:
        found = depth_first_search(child, target)
        if found:
            # "go home early"
            return found
    
    # "go home late", target wasn't found
    return None

Test it on the example — the correct output of depth_first_search(root, 8) is the node with value 8, which has a single child with value 9.

This algorithm is called depth-first search, because of the order the nodes are “visited” in. In the loop over node.children, the first recursive call (on node #2) must finish before the next recursive call (on node #7) begins.

The first recursive call searches the entire first subtree, so nodes #3, #4, #5 and #6 will all be “visited” before node #7. That is, the algorithm explores the entire depth of one subtree before moving onto the next subtree.

Iterative depth-first search

Because a tree is a recursive structure, it was easy to solve this problem recursively. But recursive functions are difficult to imagine step-by-step, which makes them hard to debug.(6)

A recursive function is computed using a stack:

  • Each time the function is called, a new frame is pushed onto the call stack.
  • When a frame returns, it’s popped from the call stack.

It’s possible to rewrite the depth-first search algorithm without recursion, using our own stack. The purpose of the call stack is to keep track of which calls haven’t finished yet. We won’t be making recursive calls, so there are no frames; instead, our stack will keep track of nodes which haven’t been “visited” yet.

The key idea is that when we see a child node, we push it onto the stack, instead of making a recursive call to search the subtree immediately. The stack contains nodes which need to be “visited”, so when we’re ready to visit another node, we pop the next one from the stack.

This is another form of the depth-first search algorithm — also called iterative depth-first search, to distinguish it from the recursive implementation. Here’s the code in Python, using a list as a stack (with the append and pop methods):

def iterative_dfs(root_node, target):
    # create a stack of nodes to visit
    nodes_to_visit = [root_node]
    
    while nodes_to_visit:
        # visit the next node
        node = nodes_to_visit.pop()
        
        if node.value == target:
            # found it!
            return node
        
        # make sure we visit the child nodes
        for child in node.children:
            nodes_to_visit.append(child)
    
    # no more nodes to visit, target wasn't found
    return None

Footnotes

  1. Not just what abstract data type — the data structure also matters. For example, binary search is appropriate for array-lists but not linked lists, despite both being lists.
  2. This is normally called an “ordered list”, “sorted list” or “sorted array” data structure. It behaves more like a set than a list, because when we insert a value, we don’t choose where it goes.
  3. If there’s more than one, we assume it doesn’t matter which one we find.
  4. In this data structure, there is no list of all nodes; otherwise this would be a list search problem. The tree is defined by the root node and its descendants.
  5. This is the “go home early” plan — we found the correct node, so we don’t need to continue searching in the third subtree. If the target is not found in any of the subtrees, then we “go home late” by returning None.
  6. Also, recursive functions can sometimes be slower to run than equivalent non-recursive functions.

There are no published comments.

New comment

Your comment will not appear until a moderator approves it.

Atom

hosted on werp.site