DS&A - Data Structures & Algorithms - Six Ways to Think About Trees

Six Ways to Think About Trees

Feb 262018

Trees in computing are an abstract idea with many diverse applications. Visual representations are especially important here, because a lot of ideas about trees are easiest to convey visually. Having more ways to think about trees also makes us more likely to recognise where they’re applicable.

So, in this post we’ll see six different ways to think about and visualise trees:

Outlines
Nested Brackets
Nested Areas
Icicle Diagrams
Node/Link Diagrams
Parent Pointers

Much like in the parable of the blind men and the elephant, these are all different views of the same concept, and there is no single “correct” or “best” one. As we go along, we’ll try to show how they really do all represent the same concept of a tree structure, and we’ll also discuss some of the applications each one is relevant to.

Outlines

Our first example is a document, book, or any other text which has sections and sub-sections. The usual way to show the tree structure of the sections is in a document’s outline, or a book’s table of contents:

Data Structures and Algorithms blog outline
1. Data types and memory
  1. Ones and Zeros
  2. Binary Numbers
  3. Booleans
  4. Memory and Pointers
2. Collection types
  1. Abstract Data Types
  2. Arrays
  3. Stacks and Queues
  4. Exercises
    1. ADTs and Collections
    2. Using Collections
3. Data structures
  1. Box-and-Pointer Diagrams
  2. The ‘Array-List’ Data Structure
  3. The ‘Linked List’ Data Structure
  4. The ‘Hash Map’ Data Structure
  5. Exercises
    1. Box-and-Pointer Diagrams
    2. List Data Structures

In this visualisation, a tree is represented with nested lists; each node is a list item, and lists have sub-lists representing the node’s children. The root node is the item at the top of the nested list structure. Parent/child relationships are represented by indentation.(1)

Although some items have the same name (e.g. “Box-and-Pointer Diagrams”), each item is uniquely identified by its numbering (e.g. “3. V. i.”).

Our second example of a tree structure is a file system, which has directories and subdirectories. The usual way to show the structure of a file system is a tree view (or tree menu), which looks a lot like an outline:

C:
- Documents
  - Abstract Data Types
    - Dictionary.docx
    - List.docx
    - Queue.docx
    - Set.docx
    - Stack.docx
  - Data Structures
    - Array-List.docx
    - Hash Map.docx
    - Linked List.docx
  - readme.txt
- Photos
  - beach1.jpg
  - beach2.jpg
  - cat.jpg
  - profile picture.jpg
- Python
  - binary.py
  - example.py
  - hangman.py
  - readme.txt
  - trees.py
- boot.ini
- pagefile.sys

The nodes are the directories and files; files are necessarily leaf nodes. Nodes may have the same name, but each node is uniquely identified by its path from the root directory. For example, the path of trees.py is C:\Python\trees.py.

Nested Brackets

Expressions in mathematics and programming languages are another example of a tree structure; an expression like 3 + 4*x has the sub-expressions 3 and 4*x, and the further sub-expressions 4 and x. In this tree, the leaf nodes are 3, 4 and x, and there are two other nodes representing the + and * operations; their children are the sub-expressions being added or multiplied.

The usual way to show an expression’s tree structure is with brackets; if every expression and sub-expression had brackets, they would look like this:

((3) + ((4) * (x)))

We don’t normally write all of these brackets in expressions, since they’re implied by the order of operations — * has precedence over +. However, writing all of the brackets shows the tree structure more clearly: each node is represented as a pair of brackets, and its children are nested within those brackets.

+
- 3
- *
  - 4
  - x

Since nested brackets are a textual representation, they’re commonly used when a text format is needed. HTML is a good example: the idea is the same, but instead of ( and ), HTML elements are opened with <foo> and closed with </foo>. You can think of the open-tags as open-brackets, and the close-tags as close-brackets:

html head titleData Structures and Algorithmstitle head body h1Data Structures and Algorithmsh1 pWelcome to my website!p ul liWe've got data structures…li liAnd algorithms…li liAnd that's it!li ul body html

With clever spacing, we can show that nested brackets represent the same structures as “outlines”: these really are different ways of thinking about the same thing.

["Data Structures and Algorithms blog outline", [
    ["Data types and memory", [
        "Ones and Zeros",
        "Binary Numbers",
        "Booleans",
        "Memory and Pointers"
    ]],
    ["Collection types", [
        "Abstract Data Types",
        "Arrays",
        "Stacks and Queues",
        ["Exercises", [
            "ADTs and Collections",
            "Using Collections"
        ]]
    ]],
    ["Data structures", [
        "Box-and-Pointer Diagrams",
        "The 'Array-List' Data Structure",
        "The 'Linked List' Data Structure",
        "The 'Hash Map' Data Structure",
        ["Exercises", [
            "Box-and-Pointer diagrams",
            "List Data Structures"
        ]]
    ]]
]]

Each node in this structure of nested lists can be uniquely identified by the indices used to access it; for example, outline[1][0][1][2] is "Booleans". This gives us our first tree data structure; notice that this is already valid Python code.

Nested Areas

The next visualisation shows tree nodes as areas; areas for child nodes are contained within the area for their parent.(2) Using expression trees as an example again, the nested areas representation looks like this:

3 + 4*x

We can directly relate this to the “nested brackets” representation, showing that the two approaches can represent exactly the same tree structures:

3 + 4*x

The nested areas representation is especially important in graphical applications, where a two-dimensional area is covered by a tree structure.(3) Returning to HTML, a webpage has a two-dimensional layout determined by the structure of its elements. Hover over the elements or code in the example below, to see how the tree structure determines both the horizontal and vertical layout.

Trees

Tree structures have many applications in computing:

File systems

HTML

Domain names

And more!

Even

this

page is a tree!

Thing

Is a tree?

File systems

Yes

HTML

Yes

Domain names

Yes

This page

Yes

article

h2Treesh2

pTree structures have many applications in computing:p

liFile systemsli

liHTMLli

liDomain namesli

liAnd more!li

pEven

emthisem

page is a tree!p

table

thThingth

thIs a tree?th

tdFile systemstd

tdYestd

tdHTMLtd

tdYestd

tdDomain namestd

tdYestd

tdThis pagetd

tdYestd

table

article

Icicle Diagrams

An icicle diagram (or flame graph) shows each node as an area, but instead of nesting the areas, they’re organised either vertically or horizontally. Here’s an icicle diagram of some domain names and their subdomains:

com

facebook

connect

ko-kr

ro-ro

vi-vn

zh-cn

www

google

images

mail

maps

news

www

microsoft

account

bingads

advertise

docs

research

www

The nodes are represented by rectangles, and children are adjacent to their parents. This example shows children to the left of their parents, since that makes sense for domain names; so the root node is furthest right.

Notice that a node represents part of a domain (e.g. news) rather than an actual domain (e.g. news.google.com), but the ‘path’ to the root node determines the actual domain.

Icicle diagrams are often used when the size of the area represents something meaningful about the node. For example, the diagram below shows the size of every file in the Linux kernel source, organised by directory:

Image credit: Brendan Gregg

From this diagram, we can see that most of the source code is for drivers, and GPU and network drivers are a large part of these. The network drivers are almost all for Ethernet or wireless devices.

To see how “nested areas” and icicle diagrams represent the same tree structures, click the button below:

Node/Link Diagrams

Node/link diagrams are a very common way to visualise trees; if you only know one kind of tree diagram, it’s probably this one.(4) Node/link diagrams explicitly represent the parent/child “links” between nodes, using lines or arrows. This is particularly useful when the nodes and links both have some meaning.

For example, the following diagram shows the fastest rail routes between London and other cities in mainland Britain.(5) The nodes represent cities, and the links represent railway lines:

Image credit: Classical geographer (Wikipedia)

London is the root node.

The next example shows the tree structure of domain names again. Child nodes are to the left of their parents, as that’s what makes sense for domain names:

The next example represents the expression tree for 3 + 4*x. The nodes in this diagram shown as circles, with child nodes below their parents; the root node is therefore the one at the top.

Node/link diagrams can show exactly the same tree structures as the other visualisations; click the button to see how they’re equivalent to “icicle diagrams”.

Node/link diagrams are useful because they’re closely related to box-and-pointer diagrams. If we can draw a tree as a node/link diagram, then we can create a data structure to represent it — the nodes will be objects, and the links will be pointers. This means node/link diagrams are an important tool for thinking about tree data structures.

However, we often only use node/link diagrams to solve problems once we’ve already identified that the data has a tree structure. The other visualisations are important because they help us recognise tree structures in the first place.

Parent Pointers

Most of the time we specify a tree structure, we say what each node’s children are. But we can specify the same tree structure by saying what each node’s parent is, except for the root node (which has no parent). This is potentially simpler, because each node can have many children, but only one parent.

For example, the table below shows the parent of each node in the rail network example. Since the nodes represent cities, the parent is the next city you’ll go to if you’re travelling to London (the root node):

Current city	Next city
Birmingham	Rugby
Bristol	London
Carlisle	Crewe
Crewe	Rugby
Edinburgh	Carlisle
Glasgow	Carlisle
Leicester	London
Liverpool	Crewe
London	—
Manchester	Crewe
Newcastle	York
Peterborough	London
Rugby	London
Sheffield	Leicester
York	Peterborough

Hover over the name of a city to find its children in the tree.

This is another textual representation, and it’s hard to visualise the tree structure just using this table. But it is simple and concise, and easy to represent in a computer program — this data can be stored in a dictionary, for example.

We’ve already thought a bit about possible tree data structures, and this is another one; we’ll go into more detail in the next post.

Footnotes

The total indentation shows the level or depth of a node — how far it is from the root node.
This visualisation is a special case of an Euler diagram. Euler diagrams in general also allow partial overlapping, but this can never happen for a tree. Two tree nodes either have an ancestor/descendant relationship or not, so their areas are either totally nested or totally separate.
Or in 3D applications, we can think about “nested volumes” to understand data structures like BSP trees and octrees.
Node/link diagrams are also commonly used for graphs, which can be used to represent any kind of relationship between pairs of things — not necessarily tree structures.
This is a tree since there is only one fastest route between each city and London. The complete rail network can be modelled as a graph, but is not a tree since there are multiple routes between some cities. I don’t guarantee that these routes actually are fastest.

Tags:
trees

Six Ways to Think About Trees

Outlines

Nested Brackets

Nested Areas

Icicle Diagrams

Node/Link Diagrams

Parent Pointers

Footnotes

There are no published comments.