You probably hear the term algorithm a couple of times a week. YouTubers especially like to praise or blame “the algorithm” for their success or failure. In this article, I want to go a little more in-depth and explain what an algorithm is and what it can do for you.
From a very abstract point of view, an algorithm is a well-defined computational procedure that describes the necessary steps to create output for a given input and establishes an input/output relation. However, this description doesn’t bring us very far.
Take, for example, a recipe for a pie. Such a recipe firstly states all the ingredients needed to make a pie; this usually includes flour, butter, eggs, and sugar. After you get all the ingredients, the recipe describes all the steps to put everything together to get a pie. The list of ingredients can be seen as the input and the pie as the output of an algorithm that describes how to make a pie.
However, recipes often leave a lot of room for interpretation and sometimes don’t detail enough to get the same result as the recipe creator. A proper algorithm gives you exact instructions on how to turn a given input into an output. Those instructions can be written down in a natural language such as English, obviously in program code, or even hardware. As natural language often takes too many words to describe something precisely and program code of a programming language such as Python can be too implementation-specific, algorithms are often written down in pseudocode, which can be seen as a mix of natural language, mathematic notation, and program code. Pseudocode for an algorithm that calculates the factorial of a given number could look like this:
1
2
3
4
5
6
7
8
9
Input: Integer n
Ensure: n >= 0
FACTORIAL(n):
x <- 1
FOR i from 1 to n DO:
x <- x * i
END
RETURN x
Pseudocode allows you to only capture the essence of a given algorithm without thinking about implementation details. The pseudocode above also provides a constraint n >= 0
necessary to ensure that the algorithm produces the correct output. This brings us to the correctness of an algorithm. An algorithm is correct if and only if the algorithm halts and produces the correct output for every possible input. An incorrect algorithm can have devastating consequences when deployed in an environment that could harm people. However, showing the correctness of an algorithm is not trivial, and it’s own field inside of computer science. An example for showing the correctness of the FACTORIAL
algorithm can be found over here: Khan Academy: Verifying an Algorithm
So an algorithm is an accurate description for turning an input into an output and is, therefore, a tool that helps solve computational problems. For which I want to give you some examples now.
Sorting
The most prominent and widely used algorithms are sorting algorithms. A sorting algorithm rearranges a list of input data according to a comparison criterion. Such a criterion could be the value of a number or the length of a string. During the process of sorting, the input data elements are compared to each other with the comparison criterion. How many elements are compared to each other is up to the algorithm designer and will have the most significant impact on the run time of the sorting algorithm.
Sorting can be found almost everywhere, for example, in a database of addresses where you need to find the email address of a friend. You could go through all the entries and check if it is the person you are looking for, but that can become very tiring. Due to this, the addresses are first sorted by the database for you, so you can scroll to your friend’s name by following the alphabet.
Another example is finding the maximum element in a list and determining if it is unique. When the list isn’t sorted every element has to be checked. Firstly, if it is the maximum element, and secondly, if this maximum element appears more than once. In a sorted list the maximum elements are either at the beginning or the end in, depending on the comparison criterion. From there you can quickly retrieve the maximum element and check if it is unique by checking if the neighboring elements have the same value.
Routing
Another important class of algorithms is routing algorithms. A routing algorithm allows you to find the shortest path between two points. Without such algorithms, delivery drivers would have to guess every time how to reach your address to deliver a parcel. Not only do drivers rely on routing algorithms also the text you retrieve from this article has to find a path from the server to your computer or phone. The whole internet is built upon routing algorithms to send data from one computer to another in a fraction of a second. Hence, the name router because it is responsible for routing data chucks from one network to another.
Cryptography
While talking about data traveling through routers on the internet, this data should be encrypted such that your personal information stays private. For encrypting and decrypting data, cryptography algorithms are used to encrypt data that only the true receiver is able to decrypt and verify that the data was actually sent by the true sender. Those algorithms belong to the public key and signature algorithms.
Linear Programming and Optimization
Many business decisions depend on making the best decisions for a purchase or investment. For example, a fast-food chain wants to open up a new restaurant in a large city, and there are multiple different spots at which they can rent out a new building. However, finding the best one involves numerous factors such as demographics in the building’s neighborhood, rent, accessibility by cars, pedestrians, delivery vehicles, competitors in the area, etc. Usually, an algorithm from the linear programming category is chosen to solve such a problem.
Runtime
For most of the problems above, a naive solution can be found pretty effortlessly, which usually is a brute force approach where every possible combination is tried out; however when you deal with more data than just a handful of elements, those algorithms will take a very, very long time to produce and output.
An algorithm can be divided into basic operations or steps, which can be summed up to get an implementation-independent runtime evaluation. For example, the FACTORIAL
algorithm above can be split into the following basic operations:
x <- 1
- Check the for loop condition
x <- x * i
- Increase
i
by one (done implicitly by the for loop) RETURN x
The basic operations of 1. and 5. are only executed once when running FACTORIAL
while 2. to 3. are executed n
times. With this method, the implementation-independent runtime of FACTORIAL
could be described as:
There is a more abstract way to describe the runtime of an algorithm that makes the runtimes of different algorithms even more comparable, which is called the big-O notation that I’m going to cover in a later article.
Data Structures
When talking about the runtime of an algorithm, it is equally important to talk about the data structures an algorithm uses to organize the data it deals with. Take, for example, the Python list
and dict
, both data structures that allow you to store data in an ordered manner. However, retrieving an element from those data structures can result in vastly different runtimes when you want to get an element from a list
that fulfills a certain property, you have to check every list
element in the worst-case scenario to find that element. But if you construct a dictionary where the key is the element property you are interested in, and the value is the element itself, you can retrieve the element with a single access and save a lot of time.
NP-completeness
For some computational problems, there are only solutions known that have to try out every possibility to find the optimal solution in the worst-case. This class of algorithms is called NP-complete, while verifying that a correct solution was found can be done in polynomial time, meaning that you only need \(n^x\) basic operations to perform the check, finding the optimal solution employs a brute-force search which takes exponential runtime \(exp(n)\). A fascinating property of NP-complete algorithms is that if someone finds a solution for one NP-complete algorithm that runs faster than in exponential time, this can be transferred over to every other NP-complete algorithm. This is also known as \(P\) vs. \(NP\) and is one of the millenium prize problems and when you solve one of the millennium problems, you will be awarded a prize of one million US dollars.
NP-complete problems are quite common. One of the most famous examples is the traveling salesman problem. Imagine a salesman that wants to visit different customers living in different cities. Now the question is, what is the optimal path to visit every customer while traveling the shortest distance and thereby taking the least amount of time. There are intelligent solving strategies for NP-complete problems that find a solution in a reasonable amount of time while keeping the amount of input data small, and you should use those when coming a across such a problem.
Conclusion
Algorithms are a crucial part of computer science, and knowing about their runtime and use-cases makes you a great computer scientist and programmer because that enables you to solve complex computational problems efficiently. The more you work with algorithms and data structures, the more you see problems that could be solved because algorithms are everywhere in our daily lives.