DS&A

Data Structures and Algorithms

Invariants

May 142018

An assertion is a claim about the state of a running program — it says that some logical condition should be true whenever a particular line of code is reached. In the previous post, we only considered assertions in code where each line is executed once, in sequence.

Of course, real algorithms are more complicated, because loops and recursion cause some code to be executed multiple times, not in a simple sequence. In this post we’ll explore invariants, which are needed to prove correctness of non-trivial algorithms.

Is Your Algorithm Correct?

May 142018

When we write an algorithm to solve a problem, the algorithm is only correct if it does actually solve that problem. That should be obvious! But how do we know whether an algorithm is correct?

This is a genuine concern because algorithms often aren’t correct, and we need to debug them. We need a way to analyse an algorithm to decide if it really does solve the problem or not.

Static Analysis

May 072018

In the previous post we analysed how efficient the insertion sort algorithm is. We did dynamic analysis — we analysed the algorithm by actually running it.

Technically, we analysed an implementation of the algorithm; perhaps a cleverer implementation could be more efficient. From one perspective, the implementation is what we really care about:

  • The implementation is what will actually be run in the real world.
  • A more efficient implementation will make a real piece of software faster.

On the other hand, it’s hard to do an accurate dynamic analysis:

  • There will always be some measurement error.
  • It might take a long time to do the analysis.
  • Running time depends on external factors like how fast the computer is, or what other programs are running. Results from different computers, or even the same computer on different days, may be incomparable.

In this post, we’ll get around these problems by making a series of simplifications. We’ll do a kind of static analysis called asymptotic analysis — this means we’ll analyse algorithms without running them, and we’ll assume the inputs are large.

‘Big O’ Notation

May 072018

By making a series of assumptions and considering only “large” inputs, we can analyse how efficient an algorithm is without actually running it. The result of this analysis is a mathematical formula called the complexity (or time complexity) of the algorithm. For example, we derived the formula n2 for the selection sort algorithm.

Using standard notation, we would say that selection sort’s complexity is O(n2), or that selection sort is an O(n2) algorithm.

This formula says, very roughly, how much “work” the algorithm has to do as a function of n, which represents the “input size”. In this post, we’ll see how to make sense of this formula, and how to derive it with as little algebra as possible.

How Fast is Your Algorithm?

May 072018

Efficiency is important; people don’t like waiting for computers to slowly do things that ought to be fast.

(This interactive feature requires Javascript to be enabled in your browser.)

So far we’ve been thinking about this in vague terms; we know, for example:

Now it’s time to go into detail: how do we know how fast an algorithm is?

Choose the Right Data Types

Apr 302018

One of the most important and overlooked parts of designing algorithms is choosing the right data types. We often assume that the types of data an algorithm will use are determined by the inputs and the output — but it might help to use a temporary collection of some data, if that collection has useful operations. Those operations are determined by the collection’s type.

In this post we’ll look at a couple of examples where using the right data types makes the problem simpler.

Recursive Sorting

Apr 162018

In the previous post we saw two iterative sorting algorithms: selection sort and insertion sort both work using a loop to put the right values in the right places, one at a time. They are two of the simplest sorting algorithms. But they’re not very efficient — for long lists, they’re too slow.(1)

In this post we’ll see one of the simplest efficient sorting algorithms, called merge sort. Rather than sorting items one-by-one, it works by recursively sorting shorter lists.

Iterative Sorting

Apr 162018

Before writing an algorithm, it’s always worth thinking about how you would solve the same problem “by hand”. After all, you can’t write instructions for how to do something if you don’t know how to do it yourself.

The problem we’re currently interested in is sorting — given a list, put its values in order. There are hundreds of algorithms which solve this problem, but ultimately every sorting algorithm does one of two things:

Breadth-First Search

Apr 092018

Depth-first search (DFS) is a fundamental algorithm for working with trees, because it gives us a way to “visit” every node in the tree, once each.(1) That’s important, because visiting every node is a subproblem of many other problems on trees.

DFS explores the full depth of one subtree before moving onto the next one. If that’s the order we want to visit the nodes in, or the order doesn’t matter, then DFS is fine; otherwise, we need a different algorithm. We’ll see one in this post.

Depth-First Search

Apr 092018

Search problems vary in two ways: what do we want to search for, and what type of collection are we searching in?(1)

When we need a search algorithm, the data structure we want to search is more important than what we want to find. We could even say that linear search and binary search work on different data structures: linear search uses a list, and binary search uses a list which must be kept in order.(2)

Search Algorithms

Apr 092018

Search problems are problems which ask us to find something, given a description of it. Typical search problems are like these:

A search algorithm is one which solves a search problem. In this post, we’ll see two algorithms which solve search problems on lists.(1)

Algorithmic “Plans”

Mar 192018

“Plans” are the basic “blueprints” or “building blocks” for algorithms — they are canned solutions to common programming problems which are simple but appear in many variations. Thinking about plans makes it easier to understand code, because we can see the intentions rather than thinking about one line at a time. Each plan usually only solves part of a problem, so a given piece of code may use many plans, and some plans always use other plans.

A Problem-Solving Process

Mar 192018

A computer programmer is somebody who converts computational problems into computational solutions. This is a “meta-problem”:

  • Given a problem, write a computer program which solves it.
    • Input: a problem statement.
    • Output: a computer program.

A computer can’t do this — writing programs requires insight and ingenuity.(1) But there are some systematic processes we can follow when writing programs, so most of the time we don’t have to hope for a “eureka!” moment.

Graph Problems

Mar 052018

Graphs and networks are useful because they’re very general — graphs show up almost everywhere. Given that there are so many different examples of graph structures, then, it’s perhaps surprising that there aren’t actually very many different kinds of problems involving graphs.

That is, there are many computing problems which graphs can help with, but not many kinds of problem. Usually, after translating from the problem domain into the language of graphs, the problem turns out to be one of a relatively small number of “classical graph problems”.

Atom

hosted on werp.site