## Depth-First Search

Apr 092018Search problems vary in two ways: what do we want to search for, and what type of collection are we searching in?(1)

When we need a search algorithm, the data structure we want to search is more important than what we want to find. We could even say that linear search and binary search work on different data structures: linear search uses a list, and binary search uses a list which must be kept in order.(2)

To search for something different, we can often just use an existing algorithm with a few small adjustments — consider how easy it is to change the linear search algorithm to search for different things in a list.

On the other hand, linear search fundamentally relies on the list’s *iterate* operation.
If a data structure doesn’t let us iterate over the whole collection, then we can’t just tweak linear search.
For example, the list-of-children tree data structure allows us to iterate over a node’s children, but not the entire tree.

Before we get started, let’s specify a search problem:

*Problem:*Given a tree as a*list-of-children*data structure, find the node with a target value.(3)*Input:*the root node.(4)*Input:*a target value.*Output:*the node with the target value, or`None`

if there is no such node in the tree.

*Example:*In the tree below, the node with the target value`8`

is highlighted.

The Python code for the example tree is given below:

```
class Node:
def __init__(self, value, children=None):
# a leaf node has an empty list of children
if children is None:
children = []
self.value = value
self.children = children
root = Node(1, [
Node(2, [
Node(3, [
Node(4),
Node(5)
]),
Node(6)
]),
Node(7, [
Node(8, [
Node(9)
])
]),
Node(10)
])
```

In this post we’ll see the simplest algorithm which solves this problem; it can be implemented with or without recursion.

### Recursive depth-first search

To solve any problem on a tree, the first idea is to use structural recursion.
We can design a recursive algorithm by asking two questions: *what* is the recursive structure, and *why* would it help to make recursive calls?

A *list-of-children* tree is a recursive structure because a `Node`

object has a list of children, which are themselves `Node`

objects.
If the root node defines the tree, then each child defines a *subtree*.

If there is a node with the target value, it’s either the root node, or it’s in one of the subtrees. If it’s the root node, we return the node itself; otherwise, if it’s in a subtree, then we can find it with a recursive call.

In this example, the target is in the second subtree.
First, the root node does not have the target value `8`

, then the target is not found by the recursive call on the first subtree.
Then the second recursive call finds the correct node, so the function returns it.(5)

Here’s the algorithm in Python:

```
def depth_first_search(node, target):
# check the root node first
if node.value == target:
return node
# search recursively in the child subtrees
for child in node.children:
found = depth_first_search(child, target)
if found:
# "go home early"
return found
# "go home late", target wasn't found
return None
```

Test it on the example — the correct output of `depth_first_search(root, 8)`

is the node with value `8`

, which has a single child with value `9`

.

This algorithm is called depth-first search, because of the order the nodes are “visited” in.
In the loop over `node.children`

, the first recursive call (on node #2) must finish before the next recursive call (on node #7) begins.

The first recursive call searches the entire first subtree, so nodes #3, #4, #5 and #6 will all be “visited” before node #7.
That is, the algorithm explores the entire *depth* of one subtree before moving onto the next subtree.

### Iterative depth-first search

Because a tree is a recursive structure, it was easy to solve this problem recursively. But recursive functions are difficult to imagine step-by-step, which makes them hard to debug.(6)

A recursive function is computed using a stack:

- Each time the function is called, a new frame is
*pushed*onto the call stack. - When a frame returns, it’s
*popped*from the call stack.

It’s possible to rewrite the *depth-first search* algorithm without recursion, using our own stack.
The purpose of the call stack is to keep track of which calls haven’t finished yet.
We won’t be making recursive calls, so there are no frames; instead, our stack will keep track of nodes which haven’t been “visited” yet.

The key idea is that when we see a child node, we *push* it onto the stack, instead of making a recursive call to search the subtree immediately.
The stack contains nodes which need to be “visited”, so when we’re ready to visit another node, we *pop* the next one from the stack.

This is another form of the *depth-first search* algorithm — also called *iterative depth-first search,* to distinguish it from the recursive implementation.
Here’s the code in Python, using a list as a stack (with the `append`

and `pop`

methods):

```
def iterative_dfs(root_node, target):
# create a stack of nodes to visit
nodes_to_visit = [root_node]
while nodes_to_visit:
# visit the next node
node = nodes_to_visit.pop()
if node.value == target:
# found it!
return node
# make sure we visit the child nodes
for child in node.children:
nodes_to_visit.append(child)
# no more nodes to visit, target wasn't found
return None
```

#### Footnotes

- Not just what abstract data type — the data structure also matters. For example, binary search is appropriate for array-lists but not linked lists, despite both being lists.
- This is normally called an “ordered list”, “sorted list” or “sorted array” data structure. It behaves more like a set than a list, because when we insert a value, we don’t choose where it goes.
- If there’s more than one, we assume it doesn’t matter which one we find.
- In this data structure, there is no list of all nodes; otherwise this would be a list search problem. The tree is defined by the root node and its descendants.
- This is the “go home early” plan — we found the correct node, so we don’t need to continue searching in the third subtree.
If the target is not found in any of the subtrees, then we “go home late” by returning
`None`

. - Also, recursive functions can sometimes be slower to run than equivalent non-recursive functions.

## There are no published comments.