Graph Data Structures
Mar 052018A graph is an abstract model of a relation between pairs of things. In the previous post we identified many relations which can be modelled as graphs; however, we don’t yet have any data structures to represent graphs in computer programs.
In this post we’ll see three graph data structures:
To demonstrate these data structures, we’ll use all three to represent the same example graph, which has 5 nodes and 7 edges:
For each data structure, the goal is represent all the necessary data about the graph, not the node/link diagram. The data should be sufficient to draw a diagram of the same graph, but your diagrams might be laid out differently. As you read through this post, try to re-draw the graph using each data structure yourself.
Node/edge data
Before we begin, there is a choice to make. Graphs are only useful for solving concrete problems if the nodes or edges can hold some data, in addition to the data describing the graph structure itself. To keep it simple, we’ll assume each node holds one data value, and edges hold no data.
If the data value (e.g. "node1"
) is unique to the node, then we can represent the node as the value itself; otherwise a node will be an object with a value
field.(1)
Both of these are common, so we’ll show examples for both.
Operations
We should also say what basic operations we would like graph data structures to support. Here are some things we want to be able to do:
- Create a new node.
- Create a new edge from a node to another node.
- Check whether or not a node has an edge to another node.
- Remove an edge.
- Remove a node, and all its edges.
- Iterate over all of the nodes.
- Iterate over all of the edges.
- Iterate over a node’s neighbours (the nodes it has edges to).
Not every graph data structure supports all of these operations — usually we don’t need all of them.
Edge list
A graph has nodes and edges — so the simplest graph data structure has just a list of nodes and a list of edges. Here’s what that looks like:
Each item in edges
is one edge, represented as the pair of nodes the edge is “from” and “to”.
Alternatively, the edges could be objects with from
and to
fields.
Operations
Unfortunately, while the edge list data structure makes it easy to represent the data, it isn’t easy to work with. Simple operations like adding a node or an edge will be straightforward — but finding a node’s neighbours, checking if an edge exists, or removing an edge, will require searching through the entire edge list.(2)
Adjacency list
If we want easy access to a node’s neighbours, we could simply store them with the node. This is the adjacency list data structure:
Hover over the model to see how it corresponds with the edge list data structure.
If the nodes are objects, then they have value
and neighbours
fields to hold the node’s value and its list of neighbours,(3)
and we still need a list of all nodes.
Otherwise, a dictionary is a good way to store both the node values (as keys) associated with their lists of neighbours.
Operations
An adjacency list is better than an edge list in almost every way.(4) We can find a node’s neighbours via the node itself, so to check or remove an edge we only need to search through a node’s neighbours, rather than all of the graph’s edges.
Warning
The name “adjacency list” is potentially misleading — it is not just one list. Each node has its own list of neighbours, i.e. nodes it’s “adjacent” to.
Adjacency matrix
Here’s our final graph data structure, called an adjacency matrix:
This is the least obvious graph data structure here. It represents the graph as a grid of 0s and 1s, with each node corresponding to one row and one column.
The numbers in the grid determine the edges; a 1 means an edge, and a 0 means there is no edge. The row says which node an edge is “from”, and the column says which edge it’s “to” — so for example, the first row is 0 1 1 0 0 because Node #1 has edges to Node #2 and Node #3.
Operations
The main advantage of the adjacency matrix data structure is that adding, removing or checking an edge is very efficient. Assuming the grid is represented using either arrays or array-lists, and we know which index corresponds to which node, then we can ‘get’ or ‘set’ the 0 or 1 for an edge using the right indices.
On the other hand, to find a node’s neighbours, we must search through all the columns in its row to find the 1s — i.e. search through all the nodes in the graph to find which of them are neighbours.
Also, the grid requires memory to store not just the edges, but also “non-edges”. This is especially bad if most nodes only have a few edges — then the grid is mostly 0s. So, normally we only use an adjacency matrix if we need to check edges very efficiently.
Footnotes
- This way, two nodes can hold the same value but still be distinguishable as different objects.
- This is especially inefficient when the graph is very large but each node only has a few neighbours.
- This is actually the same as the list-of-children data structure for a tree, just without the requirement nodes have only one parent.
- The only downside is that it is non-trivial, but still straightforward, to iterate over all of the edges.
There are no published comments.