DS&A - Data Structures & Algorithms

Search Algorithms

Apr 092018

Search problems are problems which ask us to find something, given a description of it. Typical search problems are like these:

Given a list, find the index of a target value.
Given a list of numbers, find the first negative number.
Find the first non-empty line in a text file.
Find all rows in an SQL database table matching a given WHERE clause.
Given a tree, find a node which holds a target value.
Given an HTML document, find the first <h1> tag.
Find all tags in an HTML document matching a given CSS selector.
Starting at a given node in a graph, find all nodes reachable via edges.

A search algorithm is one which solves a search problem. In this post, we’ll see two algorithms which solve search problems on lists.(1)

Linear search

We’ve already seen one example — the linear search algorithm, which solves the following problem:

Problem: Given a list, find the index of a target value.(2)
- Input: a list.
- Input: a target value.
- Output: the target’s index, or −1 if the target is not in the list.
Example: The index of 13 in [8, 2, 1, 7, 13, 10] is 4.
Example: The index of "ham" in ["spam", "eggs", "ham"] is 2.
Example: The target value 10 is not in [1, 3, 2, 5, 4].

The linear search algorithm iterates over the input list, checks whether each value equals the target, and returns the index if it does. If it reaches the end of the list, then the target was not found, so the algorithm returns −1.

In Python, it can be implemented like this:

def linear_search(lst, target):
    for i, value in enumerate(lst):
        if value == target:
            # found it!
            return i
    
    # if the loop ends, target was not found
    return -1

This algorithm is simple enough to write without working through an example step-by-step. Still, it’s worth doing this to get a better understanding of how the algorithm works:

(This interactive feature requires Javascript to be enabled in your browser.)

This model shows the steps the computer takes when executing the linear search algorithm. But it also lets us do something the computer can’t do: we can look at the whole list and easily see where the target is, without trying values one at a time. Of course, that’s not one of the operations supported by a list.

A computer can’t use values from the list until it accesses them — either by the get operation, or by iterating over the list. The model below puts you in the role of the computer:

(This interactive feature requires Javascript to be enabled in your browser.)

Use the buttons to perform the linear search algorithm yourself.

In some sense, the linear search algorithm is the only solution to this search problem. If we find the target value, then we can “go home early”, but otherwise there is no shortcut — we do have to try every index before we can know for sure whether to return −1 or not.

We could try the indices in a different order, but the target is just as likely to be anywhere — so whatever order we choose is no worse than left-to-right.(3) Logically, no algorithm is better than linear search at this exact problem.

Binary search

On the other hand, what about other search problems?

Image credit: Warner Bros. Pictures

If we know nothing about the contents of the list, then the only way to eliminate each possible index is to try it. But suppose we know the list is in order: then if a value is too small, everything to its left is also too small. If it’s too big, everything to its right is also too big.(4)

Problem: Given a list which is in order, find the index of a target value.
- Input: a list which is in order.
- Input: a target value.
- Output: the target’s index, or −1 if the target is not in the list.
Example: The index of 13 in [1, 2, 7, 8, 10, 13] is 5.
Example: The index of "ham" in ["eggs", "ham", "spam"] is 1.
Example: The target value 10 is not in [1, 2, 3, 4, 5].

This is a realistic problem — data often needs to be in order, or sometimes it’s just convenient to keep it in order. For example, when you’re looking up a word in the dictionary, you don’t have to read it from beginning to end until you find the word you want, because the words are in alphabetical order.

Take advantage of the fact that the list is in order, using the model below:

(This interactive feature requires Javascript to be enabled in your browser.)

Notice that we can eliminate many possibilities just by looking at one value in the middle of the list — and the remaining possibile indices are always a contiguous range.

To turn this idea into an algorithm, we can keep track of this “range of possibilities” with two variables; one for the left boundary, and one for the right boundary. The binary search algorithm always tries the index in the middle of this range. The key idea is that when we see a value that’s too small, we shrink the left boundary; when we see a value that’s too big, we shrink the right boundary.

Here’s a model which shows these variables; the range from left to right is highlighted in blue:

(This interactive feature requires Javascript to be enabled in your browser.)

Try searching for each value in the list, and some values which aren’t in the list.

For this list of 7 items, binary search never tries more than 3 of them; that’s a big improvement on linear search, which would sometimes try all 7. The difference becomes much more extreme for longer lists — with a million items, binary search never tries more than 20 of them! Of course, linear search doesn’t need the list to be in order — but when we can use binary search, we probably should.

Warning

Linear search works by iterating over the list, but binary search works using the get operation. So, binary search is only appropriate when getting is more efficient than iterating.

That’s the case for array-lists (and arrays), but not for linked lists.

Here’s an implementation of the binary search algorithm in Python:

def binary_search(lst, target):
    left = 0
    right = len(lst) - 1
    
    # while the range isn't empty
    while left <= right:
        # integer division rounds down
        middle = (left + right) // 2
        value = lst[middle]
        
        if value == target:
            # found it!
            return middle
        elif value < target:
            # too small, shrink the left boundary
            left = middle + 1
        else:
            # too big, shrink the right boundary
            right = middle - 1
    
    # if the loop ends, the range is empty, so target was not found
    return -1

Linear search variations

The linear search algorithm is easy to adapt to other search problems on lists. To search for something else, we only need to change the function signature and the if statement’s condition.(5) The following model shows how:

def linear_search(lst, target):

for i, value in enumerate(lst):

if value == target:

# found it!

return i

# not found

return -1

(This interactive feature requires Javascript to be enabled in your browser.)

If we want to find something other than the index, we could change the return statements. Either way, the same basic algorithm works for almost any search problem on a list.

Footnotes

Both algorithms also work just as well on arrays.
In fact, the linear search algorithm finds the index of the first occurrence of a target value, in case it appears more than once.
That is, assuming we don’t pointlessly try the same index multiple times.
“Small” and “big” are the right words for numbers, but “before” and “after” are more appropriate for alphabetical order, or other orderings.
We can also change the variable names to be more descriptive, but this doesn’t affect how the algorithm works.

Tags:
linear search, binary search, search, algorithms

Comments

Jeff12 April, 2018Thank you so much