 # DS&A

Data Structures and Algorithms ## Algorithmic “Plans”

Mar 192018

“Plans” are the basic “blueprints” or “building blocks” for algorithms — they are canned solutions to common programming problems which are simple but appear in many variations. Thinking about plans makes it easier to understand code, because we can see the intentions rather than thinking about one line at a time. Each plan usually only solves part of a problem, so a given piece of code may use many plans, and some plans always use other plans.

In fact, “plans” are usually not thought about or taught explicitly, and they don’t have standard names — even the word “plan” is not widely used — so the names in this post are just what I call them. Nonetheless, researchers﻿(1) have found that “plans” are how expert programmers think about and write code, even if the experts don’t have names for them.

In this post we’ll see some of the most important plans:

If you’ve done any programming at all, some or all of these plans should already be familiar; but it’s useful to classify them, identify which problems they apply to, and think about how to compose them.

### For each

The most fundamental plan for a list, or another collection, is to iterate over its values — do something “for each value”, one by one. The code to achieve this is called a for-each loop.

for value in lst:
# ...

This loop iterates over `lst`, and the variable `value` holds one value at a time.

Sometimes we want to do something “for each index”. This might be to set new values at each index, or to access two lists at the same indices.

for i, value in enumerate(lst):
# ...

This code uses Python’s enumerate function to iterate over the index/value pairs. This gives us an extra variable `i` within the loop, for the index of `value`.

All of the other examples here will use these two kinds of loop, but the other plans in this post can be used with other kinds of loop.﻿(2)

### Filter

If we only want to iterate over some of the values, we need to select only the ones we want. You can recognise when this plan is needed from the problem statement:

• Find the sum of the positive numbers in a list.
• Find the largest even number in a list.
• Count how many strings in a list are shorter than 4 characters.

Notice how each of these problems needs us to do something for only some of the values — only the values which meet some description (highlighted in blue).

To achieve this, we use an `if` statement within the loop — for example, if we only want positive numbers:

for value in numbers:
if value > 0:
# ...

This code uses the “for each” and “filter” plans. Using an `if` statement this way, the loop simply skips over any values we don’t want.

The condition `value > 0` can be changed to whatever condition we want to filter for. Normally the “filter” condition depends on the value but not the index.

### Map

Sometimes we don’t want the values themselves, but we want to compute something for each value instead. You can recognise when this plan is needed from the problem statement:

• Find the sum of the squares of a list of numbers.
• Convert the temperatures in a list from Celsius to Fahrenheit in-place.
• Find the greatest age of the `Person` objects in a list.

Notice how each of these problems doesn’t use the values in the list directly; instead, we want something derived from each value individually. In these examples, we have to square each number individually, or convert each temperature from Celsius to Fahrenheit, or get the age of each `Person` object.

The best way to do this is usually to calculate the derived value first, and store it in a variable for further use:

for value in numbers:
square = value ** 2
# ...

Sometimes we want to “map” the values in-place, replacing the old values from the list with the new computed values, as in the temperature conversion example:

def celsius_to_fahrenheit_in_place(temperatures):
for i, celsius in enumerate(temperatures):
# formula for converting Celsius to Fahrenheit
temperatures[i] = 1.8 * celsius + 32

This code uses the “for each index” and “map” plans. We need the index `i` so we can set a new value at `temperatures[i]`.

### Collect in new list

Sometimes we want to “filter” or “map” the values in a list, and collect the results in a new list without destroying the original data. Here’s an example which converts temperatures from Celsius to Fahrenheit, this time using the “for each”, “map” and “collect in new list” plans:

def celsius_to_fahrenheit(temperatures_c):
temperatures_f = []
for celsius in temperatures_c:
# formula for converting Celsius to Fahrenheit
fahrenheit = 1.8 * celsius + 32
temperatures_f.append(fahrenheit)
return temperatures_f

There are three parts to the “collect in new list” plan:

1. Before the loop, create a new empty list for the result.
2. Within the loop, append each individual result to the new list.
3. At the end of the loop, return the list of results, or use it some other way.

These three parts aren’t next to each other in the code — we create the list before the loop, we append to it within the loop, and we return it once the loop finishes.

It’s very common for the different parts of a plan to be separated like this, which is why it’s easier to understand code by thinking about plans rather than by reading it line-by-line.

### Running total

If we want the sum or total of a list of numbers, it’s usually obvious from the problem statement — it will include a word like “sum” or “total”:

• Find the sum of the positive numbers in a list.
• Find the sum of the squares of a list of numbers.
• Find the total rainfall, given a list of the rainfalls for each day.

Totals are also needed to calculate statistics like the mean and standard deviation; when solving that kind of problem, look up a formula for the statistic you want, and check whether any totals are required.

To calculate a total, we can’t just write `numbers + ... + numbers[n-1]`. We need to use a loop to do one addition at a time, which means we need a variable to keep track of the running total, or the “total so far”.

Here’s the code to calculate the sum of the positive numbers in a list, using the “for each”, “filter” and “running total” plans:

def sum_of_positives(numbers):
total = 0
for value in numbers:
if value > 0:
total += value

The “running total” plan also has three parts:

1. Before the loop, create a variable for the running total, and initialise it to 0.
2. Each time we “see” a value, update the running total by adding it on.
3. After the loop, the total is complete, so return it (or use it some other way).

The initial value for “total” is 0 to make the result correct without handling the first value separately. Actually, 0 is also the proper total for an empty list, so this code gives the correct result in this special case without handling it separately.

### Counter

We often want to count how many of something there are. Again, this is easy to identify from the problem statement:

• Count how many strings in a list are shorter than 4 characters.
• Given a list of `Person` objects, return the number of people who are at least 18 years old.
• Find how many of the numbers in an input list are even.

The simplest thing to do is keep track of how many things we’ve “seen” so far, and increase the count by 1 each time we “see” another one. This works very much like the “running total” plan, except instead of adding the value seen, we just add 1.

Here’s an example which counts how many strings are shorter than 4 characters, using the “for each”, “filter” and “counter” plans:

def count_short_strings(strings):
counter = 0
for value in strings:
if len(value) < 4:
counter += 1
return counter

The “counter” plan also has three parts:

1. Before the loop, create a “counter” variable, and initialise it to 0.
2. Each time we “see” something being counted, increment the counter.
3. After the loop, the counter is complete.

A search problem is one where we need to find something, given a description of it. If there’s only one of them, or we don’t care which one we find, then the first one will do; or sometimes we want the first one specifically.

Typical problems look like this:

• Find the index of a target value in a list of strings.
• Find an index of any null value in a list of objects.
• Find the first negative number in a list.

Unless we have some extra information about the list we’re searching through, all we can do is look at each value one-by-one, and check if it meets the description.

The code looks like this:

def linear_search(strings, target):
for i, value in enumerate(strings):
if value == target:
return i
return -1

This example uses the “for each index” plan because we need to return the index, a “filter” for the target value, and a third plan which I call “go home early”. The “go home early” plan has two parts:

1. If we found what we’re looking for, then we already know the correct result, so we can return it immediately — the function “goes home early”.
2. Therefore, if we get to the end of the loop, we must not have found it, so handle this case separately — the function “goes home late”.

In this example, if the target is not found then the function “goes home late” by returning a “sentinel value”. Normally, the sentinel is −1 for “find the index of” problems, because −1 is not a valid index.

This example (find the index of a target value) is called the linear search algorithm, hence my name for this combination of plans. But we can use the same plans to do other kinds of searches; this example finds the first negative number in a list:

def find_first_negative(numbers):
for value in numbers:
if value < 0:
return value
raise ValueError('No negative number found')

This code uses the “for each”, “filter” and “go home early” plans. For this problem we don’t need the indices, and if no negative number is found, the function “goes home late” by raising an error.

### Best so far

Sometimes instead of wanting to find a particular thing, we want to find the “best” of something. This could be the highest number, the longest string, the `Person` object with the highest score, or the hit closest to the target.

Alternatively, “best” might mean the lowest number, the shortest string, the `Person` object with the lowest score, or the hit furthest away from the target — it depends what we’re searching for.

Put generally, there is something we want either the maximum or the minimum of. Some example problems:

• Find the largest number in a list.
• Given a list of strings, find the longest uppercase string.
• Find the index of the lowest even number in a list.

Again, assuming we have no extra information about the contents of the list, the solution is to look at each value one-by-one, and remember what the “best” thing we’ve seen so far is; each time we see something “better”, that’s the new “best so far”.

Here’s the “find the largest number” example, which uses the “for each” and “best so far” plans:

def find_largest_number(numbers):
largest = numbers
for value in numbers:
if value > largest:
largest = value
return largest

The “best so far” plan has a few parts:

1. Before the loop, declare a “best so far” variable — initially this is the first thing we “see”, at index 0.﻿(3)
2. Each time we “see” something “better”,﻿(4) we update the “best so far”.
3. After the loop, the “best so far” is the true “best”.

In this example, something is “better” if `value > largest`, because we want the largest number. If instead we want the smallest number, we would name the variable `smallest` and change the condition to `value < smallest`.

Here’s an example which finds the longest uppercase string. This time, something is “better” if `len(value) > len(longest)`, because we want to compare by length.

def find_longest_uppercase(strings):
longest = None
for value in strings:
if value.isupper():
if longest is None or len(value) > len(longest):
longest = value
return longest

This code uses the “for each”, “filter” and “best so far” plans.

This problem is a bit trickier, since `strings` might not be uppercase, so the only sensible initial value for `longest` is `None`. This means, after filtering, there are two reasons to update `longest`:

1. Whatever we find first is automatically the “best so far”.
2. Otherwise, something is “better” if it’s longer than what we have already.

This detail always emerges when combining the “filter” and “best so far” plans. This function will return `None` if no uppercase string is found.

#### Footnotes

1. See e.g. Soloway (1986) and Sajaniemi & Prieto (2005).
2. Other kinds of loops might read lines from a file or rows from a database, or they might pop from a stack or poll from a queue.
3. Since we’ve already “seen” the value at index 0, the loop could be changed to start at index 1.
4. This is not really the “filter” plan, despite having an `if` statement inside the loop, because the condition changes dynamically as `largest` gets updated. The next example shows how the “best so far” and “filter” plans might be used together.

## New comment

Your comment will not appear until a moderator approves it.

Atom

hosted on werp.site