Tree Data Structures
Feb 262018Abstract data types define what data can be represented and what operations can be performed on it, but not how the data should be stored or structured in a program. Data structures are concrete implementations of ADTs which fill in these details — they define how the data should be stored in memory, and what instructions should be followed to perform the operations.
Trees are both; we have an abstract idea of what a tree is, but there are also data structures for them, which we’ll look at in this post. In fact, we can even use tree data structures to implement other ADTs, such as sets or dictionaries.(1)
Actually, our idea of trees is even more abstract than an ADT, because there is no common set of operations which can be performed on all kinds of tree. We might want some or all of the following:
- Get the tree’s root node.
- Get or set some data stored at a node.(2)
- Get the size of the tree (the number of nodes).
- Get the parent of a node.
- Get the children of a node, or iterate over them.
- Check whether a node is a leaf.
- Insert a new node.(3)
- Remove a node.(4)
- Iterate over all the nodes in the tree.
- Get the siblings of a node, or iterate over them.
- Get the ancestors of a node, or iterate over them.
- Iterate over the descendants of a node.
This seems like a lot, but most tree data structures don’t support all of these operations.
List-of-lists
Trees can be represented as outlines, which are nested list structures. In Python, we can write lists with square brackets, and nested lists with nested brackets. This gives us a simple data structure for trees, called a list-of-lists.
Step through the animation below to see how a tree (shown here as a node/link diagram) is represented as a list-of-lists:
The list-of-lists data structure is particularly useful when each leaf node holds one data value, and the internal nodes hold no data. Then, leaf nodes are represented simply by their data values, and internal nodes are represented by lists. Here’s the same data structure as a box-and-pointer diagram:(5)
The values in any particular list are that node’s children (or pointers to them). To access a particular node in the tree, we need to ‘get’ from the appropriate index in each nested list.
List-of-children
The list-of-lists data structure is a simple way to represent a tree if only the leaf nodes hold data — then an internal node is its list of children.
Otherwise, if internal nodes should hold data too, then each node has both a data value and a list-of-children.(6)
To do this, we define a Node
class with fields for a value
and a list of children
.
Here’s the class in Python:
class Node:
def __init__(self, value, children=None):
self.value = value
self.children = children
Using this data structure, Node
objects represent both the leaf and internal nodes; a leaf node has no children, so it has no children
list.(7)
Step through the animation below to see how a tree is represented using Node
objects:
Here’s the box-and-pointer diagram for this data structure — it contains Node
objects and lists:
To access a child node, we need to ‘get’ from the appropriate index in children
.
Parent pointer tree
The list-of-lists and list-of-children data structures give us easy access to a node’s children, but not its parent.
One solution is to give the Node
class an additional field for the parent
.
However, in most applications we don’t need to access both children and parents, so it’s usually only worth storing one or the other.
If we store a node’s parent instead of its children, then we have a parent pointer tree.
For example, in the tree structure of “fastest routes to London”, you can find the next city in the route by finding the parent of whichever city you’re currently in.
Here’s a class for a parent-pointer tree node in Python:
class PPNode:
def __init__(self, parent, value):
self.parent = parent
self.value = value
Here’s a box-and-pointer diagram for the same tree as above, using the parent pointer tree data structure:
The root node has a null pointer for its parent
, because it’s the only node with no parent.
Since we can’t access an object unless we have a pointer to it, to use this data structure we’d need a separate collection (e.g. a list or a dictionary) to hold pointers to all of the nodes in the tree.
Footnotes
- Compare with stacks; a stack is not a list, but we can implement a stack using an array-list or linked list data structure. Similarly, sets and dictionaries are not trees, but we can implement them using tree data structures.
- We could store a data value (or multiple data values) at each node, or at each leaf node.
- We might specify what it’s parent should be, or the data structure might choose for us.
- The node’s descendants could also be removed, or moved elsewhere in the tree.
- To keep the box-and-pointer diagrams simple, we’ll show lists as just sequences of values, rather than representing an actual list data structure.
- This is a special case of the adjacency list data structure for a graph.
- This way we won’t need to inspect the type of a node to determine if it’s a leaf; we can just check whether
children
is a null pointer. Alternatively we could makechildren
an empty list.
There are no published comments.