DS&A - Data Structures & Algorithms

Trees

Feb 262018

There are two kinds of ‘tree’ that everyone knows about: actual trees made of wood, and family trees. Family trees are closer to how we think about trees in computing; a family tree is an abstract diagram of the parent/child relationships within a family. There are many kinds of hierarchical relationships which need to be modelled in computer programs.

Trees in computing

In computing, any hierarchical structure can be thought of as a tree:

A document can have sections, which can have subsections.
A directory can contain other directories.
Selecting an option on a menu might open a sub-menu.
An HTML element can contain other HTML elements.
A domain (like werp.site) can have subdomains (like dsaa.werp.site).
An expression (like 3 + 4*x) can have sub-expressions (like 4*x).
A process can spawn subprocesses.

These are all tree structures, because:

There are some ‘things’, which are the same type as each other.
Each ‘thing’ can have ‘sub-things’ (or it might have none).
There is a special ‘thing’ at the top of the hierarchy, and everything else is a ‘sub-thing’, or a ‘sub-thing’ of one of its ‘sub-things’, or so on.

This is necessarily a very abstract description, so it’s worth thinking about how to fill in the ‘blanks’ in some of the examples above:

If the ‘things’ are directories, then the ‘sub-things’ are subdirectories, and there is a root directory called C:\ on Windows, or / on UNIX or Linux.
If the ‘things’ are HTML elements, then the ‘sub-things’ are child elements, and there is a root element which must be the <html> tag.
If the ‘things’ are domains, then the ‘sub-things’ are subdomains, and the ‘root’ is whichever top-level domain is used (e.g. com, org, or site).

Terminology for trees

We are going to need better words than ‘thing’, ‘sub-thing’, ‘contains’ and so on; without specific terminology, it’s hard to communicate and explain ideas about trees. For example, if I say a directory contains another directory, do I mean it’s an immediate subdirectory, or possibly a subdirectory of a subdirectory, or so on?

We normally use a mixture of terminology for actual trees (made of wood), family trees, and graph theory.(1) All of the following is standard:

The ‘things’ in a tree are called nodes.
A ‘sub-thing’ is a child or a child node.
The ‘thing’ at the ‘top’ of the hierarchy is the root node.
Each node (except the root node) has a parent or parent node.
A node is a sibling of another node if they have the same parent.
An ancestor is a parent, or a parent’s parent, or so on.
A descendant is a child, or a child’s child, or so on.
A node with children is an internal node.
A node with no children is a leaf node, or external node.

Now we can distinguish whether a directory contains another directory “as a child” or “as a descendant”. Let’s use this terminology to rewrite our abstract definition:

A tree has nodes, which normally all have the same type.(2)
Each node can have other nodes as children, or it might have none.
There is a root node, which has no parent; all other nodes are descendants of the root node.

We also now have the terminology to specify that each node (except the root) has exactly one parent.(3)

Non-examples of trees

It’s also worth thinking about some examples that aren’t trees:

A building can have floors, which have rooms. However, this isn’t a proper tree structure, because buildings, floors and rooms aren’t the same type of thing. Buildings can’t contain other buildings, and floors can’t contain buildings or other floors.(4)
Technically a family tree is not a tree, because people have two parents. By our definition for a tree, each node (except the root) should have one parent.(5)
A string can have substrings. However, this doesn’t form a tree structure; there is no ‘root’ string, and for example the string "bc" would be a ‘child’ of both "abc" and "bcd".

Conclusion

In this post we discussed what ‘trees’ are in a computing context, and gave some examples of things which are and aren’t trees (by our definition). We also presented the standard terminology for trees.

Here’s some things we haven’t done yet:

There are — deliberately — no pictures or diagrams of trees in this post. There are many ways to visually represent trees, none is “the correct way” or the best way in all situations. Also, most of the time trees are applicable, there’s no visual clue; so it’s really important to be able to identify this from verbal descriptions like “a directory can have subdirectories”.
A type defines a kind of data, and what operations are supported. We’ve discussed the data, but not the operations.
We don’t yet have any data structures to implement trees, or algorithms to solve problems with them.

These are all coming later; the next post will show visual representations, so we’ll have many different ways to understand trees.

Footnotes

Graph theory is used to mathematically model any kind of relationship between pairs of things; for example, the ‘sub-thing’ relationship in a tree.
Alternatively, a tree might have internal nodes of one type, and leaf nodes of another type.
Therefore, there are no ‘diamonds’ where a node is an ancestor of another node in two ways, and no ‘cycles’ where a node is an ancestor of itself.
In principle, we could still model a building as a tree structure, but we’d prefer a model which enforces these constraints.
Alternatively, a family tree could show just the ancestors of one person. In this case, a node’s children in the tree would actually be that person’s parents, and vice versa. But this would still not necessarily be a tree, because a person can have the same ancestor in two lineages.

Tags:
trees