We have the idea of an abstract data type (ADT) — a data type which is defined by its set of possible values and set of supported operations, but which does not specify how those values should be represented in memory, nor how the operations should be implemented. We also have a memory model for thinking about data structures, which fill in these details.(1)
Having defined some of the standard collection types as ADTs — lists, sets and dictionaries — it is worth thinking about how to implement them. For this post, let’s just think about sequences. The obvious way to represent a sequence in memory is to put the values one after another, contiguously:
This is a nice simple data structure, which ought to be easy to work with. However, what we gain in simplicity, we lose in flexibility. This data structure has two problems:
- Suppose we want a function to calculate the sum of a sequence. For a sequence of length 5, we can add its five values together; but for a sequence of a different length we would need to do a different number of additions. It’s not possible to determine this sequence’s length just from a reference to it (i.e. if we know its memory address).
- Recall that lists should support the ‘append’ operation — but this data structure cannot support an operation like ‘append’ which makes the sequence longer, because the memory location for the appended value might not be available.
Both of these problems are illustrated in the example below: in problem 1,
var1’s value could be mistaken for part of the sequence;(2)
in problem 2, appending a new value would overwrite
var1, losing its old value.
The first problem is easy to fix: if we need the sequence’s length, let’s just store it in the first memory location, and begin the actual sequence from the second location. This is called a length-prefixed sequence.
Arrays in C and C++ are not length-prefixed. If a function needs to handle arrays of different lengths, then it should take the array’s length as a separate parameter.
We’ll leave the second problem until next week, and just accept that this data structure is not a list; it cannot support any operation which changes the sequence’s length — so the
length field should be read-only.
However, while this is not a list, it’s still a type of sequence, so let’s consider the type of sequence it actually is — an array.(3)
An array represents a fixed-length sequence of values, and supports the following operations:
- Create a new array of length
- Get the value at index
- Set a new value
i, replacing the old value there.
- Get the length of the array.
Python does not natively have any type which behaves like an array instead of a list.
The Python module called
array provides a data structure which is actually an array-list — it does support all of a list’s operations.
Try these operations out using the interactive model below:
- An ADT is a specification, while a data structure is an implementation.
- In particular, the two primitive memory operations ‘read’ and ‘write’ aren’t enough to determine what a given memory location is being used for.
- Perhaps confusingly, the name “array” is used for both the abstract data type and the data structure which implements it. The distinction rarely matters because the data structure (a sequence stored contiguously in memory) is the only sensible way to implement the data type (a fixed-length sequence).