DS&A - Data Structures & Algorithms

Variants

May 142018

An algorithm is correct if it always returns the correct result.

Assertions are our main tool for proving algorithms are correct; they state that some condition will be true whenever a particular line of code is reached. The goal is to assert that, when the algorithm finishes running, the result is correct. If this assertion never fails, then the algorithm will never return a wrong result.

An unfortunate problem

Consider the following sorting algorithm; assume the is_sorted function tests whether lst is sorted or not.

def infinite_loop_sort(lst):
    while not is_sorted(lst):
        # do nothing
        pass

This loop does nothing! Certainly, it makes no progress towards sorting the list. Yet we can be absolutely sure it never returns a wrong result, because it never returns a result at all!

This is really a problem. If we write a return statement, and assert that the result is correct, the techniques we developed in the previous post actually allow us to “prove” that this clearly incorrect algorithm is “correct”:

def infinite_loop_sort(lst):
    while not is_sorted(lst):
        # do nothing
        pass
    
    assert is_sorted(lst)
    return lst

The “proof” is as follows. At the end of the loop, the assertion cannot fail, because if is_sorted(lst) is false, the loop would not have stopped. So, the algorithm never returns a wrong result.

Termination

What we want to prove is that the algorithm always returns a correct result. But assertions only let us say that if the algorithm returns a result, then the result will be correct.(1)

We also need to prove that the algorithm will return a result at all — i.e. that it will eventually terminate, or finish running. In this post, we’ll explore the idea of variants, which are needed to prove that algorithms terminate. The good news is that variants are simpler and easier to use than invariants.

Analogy

In the game of chess, a pawn can only move forwards. If you keep moving the same pawn, eventually it must reach the end of the board.

(This interactive feature requires Javascript to be enabled in your browser.)

The y coordinate gets smaller on every move, so it must eventually reach 0. This proves that the pawn will reach the end of the board in finitely many moves — it can’t get stuck in an “infinite loop”.

In other words, the y coordinate is a variant.

Example: factorial

The following algorithm computes the factorial of n:

def factorial(n):
    if n < 0:
        raise ValueError('n must not be negative')
    
    r = 1
    while n > 0:
        r *= n
        n -= 1
    return r

In this loop, n is a variant — it gets smaller on each iteration, and the loop stops when n is 0. Therefore, this algorithm will always terminate.

Example: insertion sort

Consider the insertion sort algorithm.

def insertion_sort(lst):
    n = len(lst)
    for i in range(n):
        j = i
        while j > 0 and lst[j-1] > lst[j]:
            tmp = lst[j]
            lst[j] = lst[j-1]
            lst[j-1] = tmp
            j -= 1

The inner while loop must terminate, because j gets smaller on each iteration, and the loop terminates when j is 0. If the other condition lst[j-1] > lst[j] is false, then the loop might terminate sooner — but it will terminate.

So, j is a loop variant.

Example: exponentiation

In the previous post we proved that the following algorithm never returns a wrong result:

def power(x, n):
    if n < 0:
        raise ValueError('n must not be negative')
    
    r = 1
    i = 0
    
    while i < n:
        r *= x
        i += 1
    
    return r

Let’s also prove that the loop must terminate. This example is a bit different, since there is no variable “counting down”, and the loop stops when i reaches n, not when something reaches 0.

This loop terminates because i keeps getting bigger, so it must eventually reach n. Put another way, the difference n - i keeps getting smaller. For example, consider the trace from computing power(3, 4):

n	i	n - i
4	0	4
4	1	3
4	2	2
4	3	1
4	4	0

The expression n - i gets smaller on every iteration, because i gets bigger but n stays the same. When n - i reaches 0, the condition i < n becomes false and so the loop terminates.

So, n - i is a loop variant.

Example: binary search

Consider the binary search algorithm:

def binary_search(lst, target):
    left = 0
    right = len(lst) - 1
    
    while left <= right:
        middle = (left + right) // 2
        value = lst[middle]
        
        if value == target:
            return middle
        elif value < target:
            left = middle + 1
        else:
            right = middle - 1
    
    return -1

This example is a bit more complicated, because there’s no variable counting up or down to a fixed limit.

Intuitively, binary search should terminate because the “range of possibilities” keeps getting smaller — this range is from left to right, so right - left could be a loop variant.

On each iteration, one of three things happens:

The algorithm might terminate immediately — so, no problem.
Or left becomes middle + 1.
Or right becomes middle - 1.

The loop condition is left <= right, and middle is halfway between them. So, either left gets bigger, or right gets smaller — either way, right - left gets smaller.

To prove the loop terminates, we need to show that left <= right eventually becomes false. This happens when right - left reaches −1, so the proof is complete.(2)

Loop variants

After these examples, hopefully it’s clear what loop variants are, and how we can use them to prove algorithms terminate. But let’s also give a proper definition.

A loop variant is an expression which:

Has an integer value.
Gets at least 1 smaller on every iteration of the loop.
When it gets below some threshold (usually 0), the loop condition becomes false.

Any loop with a variant is guaranteed to terminate, because an integer can’t keep getting smaller forever without falling below the threshold. Once it reaches the threshold, the loop condition becomes false, so the loop terminates.

A note on ‘for’ loops

In this post we’ve only analysed while loops. On the other hand, many algorithms use for loops:

for i in range(n):
    # ...

If a for loop iterates over a finite collection, such as a list, set or dictionary, or a finite range of integers, then the number of iterations is strictly limited to the size of that collection or range, so it can’t be an infinite loop.(3)

If necessary, we can still use variants to prove that a for loop terminates, by writing an equivalent while loop. This example iterates over the same range, and has the variant n - i.

i = 0
while i < n:
    # ...
    i += 1

Footnotes

Fundamentally, an assertion says that when a line of code is reached, the condition will be true. There is no way to assert that a line of code will be reached.
The variant is guaranteed to reach −1 for the same reason it’s guaranteed to reach 0. Alternatively, we could choose right - left + 1 as a variant.
This assumes the size doesn’t change during the loop. It’s generally a mistake to make a collection larger while you’re iterating over it with a for loop, and many languages will raise an error if they detect this happening.

Tags:
variants, correctness, algorithms