## Arrays

Feb 122018We have the idea of an abstract data type (ADT) — a data type which is defined by its *set of possible values* and *set of supported operations,* but which does not specify how those values should be represented in memory, nor how the operations should be implemented.
We also have a memory model for thinking about data structures, which fill in these details.(1)

Having defined some of the standard collection types as ADTs — lists, sets and dictionaries — it is worth thinking about how to implement them. For this post, let’s just think about sequences. The obvious way to represent a sequence in memory is to put the values one after another, contiguously:

This is a nice simple data structure, which ought to be easy to work with. However, what we gain in simplicity, we lose in flexibility. This data structure has two problems:

- Suppose we want a function to calculate the sum of a sequence. For a sequence of length 5, we can add its five values together; but for a sequence of a different length we would need to do a different number of additions. It’s not possible to determine this sequence’s length just from a reference to it (i.e. if we know its memory address).
- Recall that lists should support the
*‘append’*operation — but this data structure cannot support an operation like*‘append’*which makes the sequence longer, because the memory location for the appended value might not be available.

Both of these problems are illustrated in the example below: in problem 1, `var1`

’s value could be mistaken for part of the sequence;(2)
in problem 2, appending a new value would overwrite `var1`

, losing its old value.

The first problem is easy to fix: if we need the sequence’s length, let’s just store it in the first memory location, and begin the actual sequence from the second location.
This is called a *length-prefixed* sequence.

#### C/C++-specific detail

Arrays in C and C++ are *not* length-prefixed.
If a function needs to handle arrays of different lengths, then it should take the array’s length as a separate parameter.

We’ll leave the second problem until next week, and just accept that this data structure is not a list; it cannot support any operation which changes the sequence’s length — so the `length`

field should be read-only.
However, while this is not a list, it’s still a type of sequence, so let’s consider the type of sequence it *actually* is — an array.(3)

An *array* represents a fixed-length sequence of values, and supports the following operations:

*Create*a new array of length`n`

.*Get*the value at index`i`

.*Set*a new value`x`

at index`i`

, replacing the old value there.- Get the
*length*of the array.

#### Python-specific detail

Python does not natively have any type which behaves like an array instead of a list.
The Python module called `array`

provides a data structure which is actually an array-list — it *does* support all of a list’s operations.

Try these operations out using the interactive model below:

#### Footnotes

- An ADT is a
*specification*, while a data structure is an*implementation*. - In particular, the two primitive memory operations
*‘read’*and*‘write’*aren’t enough to determine what a given memory location is being used*for*. - Perhaps confusingly, the name “array” is used for both the abstract data type and the data structure which implements it. The distinction rarely matters because the data structure (a sequence stored contiguously in memory) is the only sensible way to implement the data type (a fixed-length sequence).

## There are no published comments.