DS&A

Data Structures and Algorithms

The ‘Array-List’ Data Structure

Feb 192018

In earlier posts we described lists and arrays. We described lists as an abstract data type, meaning we said what a list is and what we should be able to do with one. However, we didn’t say how a list is implemented — we didn’t describe a data structure for a list — so we don’t yet have a scheme for representing lists in memory.(1) On the other hand, when we described arrays, we did also describe how to represent them in memory.

On the face of it, an array cannot do everything a list can do — an array is not a list, and does not support operations like ‘append’, ‘insert’, or ‘remove’ because arrays have fixed lengths. So we can’t use arrays as lists, but nonetheless it’s possible to use arrays to represent lists, using the array’s ‘get’ and ‘set’ operations to implement the missing list operations. In this post we’ll develop a data structure which does just that, called an array-list (or a dynamic array).(2)

The core idea is to use a longer array than necessary, so there is “unused” space which is available for the list to grow. There are details to fill in, but the interactive model below shows this core idea:

(This interactive feature requires Javascript to be enabled in your browser.)

Use the buttons to see which array operations are used to implement the list operations. You should notice that some list operations require several array operations; in particular, to insert or remove a value in the middle of the list, we have to shift all the subsequent values forwards or backwards. You should also notice that the array’s length field is not the same as the list’s length: the array is not a list, it’s being used to represent a list.

This is a good first attempt at a list data structure, but it has a big problem: once there is no “unused” space left in the array, we can no longer append or insert into the list. This means the model above does not behave like a list in all circumstances.

We can’t make the array longer; an array’s length is fixed when it’s created. If we want a longer array, we have to create a new one. To ensure there will be enough space for plenty more elements, let’s make the new array twice as long.(3) Whenever we create a new array, we will have to copy across all the values from the old array.

Try inserting more than four values using the interactive model below:

(This interactive feature requires Javascript to be enabled in your browser.)

So our second attempt solves this problem but there is a more subtle problem.

Except for the missing ‘length’ operation, the model above behaves exactly like a list. But a reference to this array does not behave like a reference to a list. When we create a new array and copy the list values across, any existing references to the old array will still point at the old array; they won’t automatically be updated just because we created a new array.

Try the Python code below to see the problem: the variables a and b reference the same list object, so changing a value in a changes the same list referenced by b. But when we create a new longer list for the variable a, the variable b still references the old list.

>>> a = ['duck']*3
>>> b = a
>>> b
['duck', 'duck', 'duck']
>>> a[2] = 'goose'
>>> b
['duck', 'duck', 'goose']
>>> a = ['duck']*6
>>> a
['duck', 'duck', 'duck', 'duck', 'duck', 'duck']
>>> b
['duck', 'duck', 'goose']
>>> a[5] = 'goose'
>>> a
['duck', 'duck', 'duck', 'duck', 'duck', 'goose']
>>> b
['duck', 'duck', 'goose']

Use BlueJ’s code pad to try the Java code below: the variables a and b reference the same String[] array, so changing a value in a changes the same array referenced by b. But when we create a new longer array for the variable a, the variable b still references the old array.

>>> import java.util.Arrays;
>>> String[] a = new String[] { "duck", "duck", "duck" };
>>> int[] b = a;
>>> Arrays.toString(b)
"[duck, duck, duck]"   (String)
>>> a[2] = "goose";
>>> Arrays.toString(b)
"[duck, duck, goose]"   (String)
>>> a = new String[] { "duck", "duck", "duck", "duck", "duck", "duck" };
>>> a
"[duck, duck, duck, duck, duck, duck]"   (String)
>>> Arrays.toString(b)
"[duck, duck, goose]"   (String)
>>> a[5] = "goose";
>>> Arrays.toString(a)
"[duck, duck, duck, duck, duck, goose]"   (String)
>>> Arrays.toString(b)
"[duck, duck, goose]"   (String)

The solution is to represent the list as a separate object, and this object will hold a reference to the array which currently holds the list’s values. When we replace the array, we change the object’s internal reference to point to the new array; any existing references to the list object will continue to behave as expected. While we’re at it, this object can also hold the list’s length.(4)

We’ve now developed a completely correct list data structure:

(This interactive feature requires Javascript to be enabled in your browser.)

This model demonstrates almost everything worth knowing about how array-lists are implemented.(5) In the next post, we’ll consider an alternative data structure for a list, which doesn’t use arrays.

Footnotes

  1. Of course, almost all programming languages do provide a list data structure. It’s not necessary to implement one ourselves if we just want to use lists, but our goal here is to understand data structures.
  2. The name array-list follows a convention for naming data structures, where the second word (list) says what ADT it implements, and the first word (array) says how it’s implemented.
  3. This is an arbitrary choice; the scale factor doesn’t have to be 2. The choice to start with an array of length 4 is also arbitrary.
  4. Storing the length in a field is not just more efficient than counting the number of items in the array — in principle, lists can contain null values, so we wouldn’t know whether to count them or not.
  5. You may also notice that after removing a lot of values from the list, we can save memory by creating a shorter array.

There are no published comments.

New comment

Your comment will not appear until a moderator approves it.

Atom

hosted on werp.site