DS&A

Data Structures and Algorithms

Tree Data Structures

Feb 262018

Abstract data types define what data can be represented and what operations can be performed on it, but not how the data should be stored or structured in a program. Data structures are concrete implementations of ADTs which fill in these details — they define how the data should be stored in memory, and what instructions should be followed to perform the operations.

Trees are both; we have an abstract idea of what a tree is, but there are also data structures for them, which we’ll look at in this post. In fact, we can even use tree data structures to implement other ADTs, such as sets or dictionaries.(1)

Actually, our idea of trees is even more abstract than an ADT, because there is no common set of operations which can be performed on all kinds of tree. We might want some or all of the following:

  • Get the tree’s root node.
  • Get or set some data stored at a node.(2)
  • Get the size of the tree (the number of nodes).
  • Get the parent of a node.
  • Get the children of a node, or iterate over them.
  • Check whether a node is a leaf.
  • Insert a new node.(3)
  • Remove a node.(4)
  • Iterate over all the nodes in the tree.
  • Get the siblings of a node, or iterate over them.
  • Get the ancestors of a node, or iterate over them.
  • Iterate over the descendants of a node.

This seems like a lot, but most tree data structures don’t support all of these operations.

List-of-lists

Trees can be represented as outlines, which are nested list structures. In Python, we can write lists with square brackets, and nested lists with nested brackets. This gives us a simple data structure for trees, called a list-of-lists.

Step through the animation below to see how a tree (shown here as a node/link diagram) is represented as a list-of-lists:

The list-of-lists data structure is particularly useful when each leaf node holds one data value, and the internal nodes hold no data. Then, leaf nodes are represented simply by their data values, and internal nodes are represented by lists. Here’s the same data structure as a box-and-pointer diagram:(5)

(This diagram requires Javascript to be enabled in your browser.)

The values in any particular list are that node’s children (or pointers to them). To access a particular node in the tree, we need to ‘get’ from the appropriate index in each nested list.

List-of-children

The list-of-lists data structure is a simple way to represent a tree if only the leaf nodes hold data — then an internal node is its list of children.

Otherwise, if internal nodes should hold data too, then each node has both a data value and a list-of-children.(6) To do this, we define a Node class with fields for a value and a list of children. Here’s the class in Python:

class Node:
    def __init__(self, value, children=None):
        self.value = value
        self.children = children

Using this data structure, Node objects represent both the leaf and internal nodes; a leaf node has no children, so it has no children list.(7)

Step through the animation below to see how a tree is represented using Node objects:

Here’s the box-and-pointer diagram for this data structure — it contains Node objects and lists:

(This diagram requires Javascript to be enabled in your browser.)

To access a child node, we need to ‘get’ from the appropriate index in children.

Parent pointer tree

The list-of-lists and list-of-children data structures give us easy access to a node’s children, but not its parent. One solution is to give the Node class an additional field for the parent. However, in most applications we don’t need to access both children and parents, so it’s usually only worth storing one or the other. If we store a node’s parent instead of its children, then we have a parent pointer tree.

For example, in the tree structure of “fastest routes to London”, you can find the next city in the route by finding the parent of whichever city you’re currently in.

Here’s a class for a parent-pointer tree node in Python:

class PPNode:
    def __init__(self, parent, value):
        self.parent = parent
        self.value = value

Here’s a box-and-pointer diagram for the same tree as above, using the parent pointer tree data structure:

(This diagram requires Javascript to be enabled in your browser.)

The root node has a null pointer for its parent, because it’s the only node with no parent. Since we can’t access an object unless we have a pointer to it, to use this data structure we’d need a separate collection (e.g. a list or a dictionary) to hold pointers to all of the nodes in the tree.

Footnotes

  1. Compare with stacks; a stack is not a list, but we can implement a stack using an array-list or linked list data structure. Similarly, sets and dictionaries are not trees, but we can implement them using tree data structures.
  2. We could store a data value (or multiple data values) at each node, or at each leaf node.
  3. We might specify what it’s parent should be, or the data structure might choose for us.
  4. The node’s descendants could also be removed, or moved elsewhere in the tree.
  5. To keep the box-and-pointer diagrams simple, we’ll show lists as just sequences of values, rather than representing an actual list data structure.
  6. This is a special case of the adjacency list data structure for a graph.
  7. This way we won’t need to inspect the type of a node to determine if it’s a leaf; we can just check whether children is a null pointer. Alternatively we could make children an empty list.

There are no published comments.

New comment

Your comment will not appear until a moderator approves it.

Atom

hosted on werp.site