Think Like a Coder: Introduction to Algorithms Part 2 – Digital Initiatives at the Grad Center

This a continuation of Think Like a Coder: Introduction to Algorithms Part 1. Be sure to also check out this semester’s Think Like a Coder workshop.

Introduction

Last time, we considered a simple program that was merely searching for a target element in a list. Now imagine a program that involves doing some sort of a calculation or additional check at each step—for instance, rather than checking if a certain item appears in a list of ingredients, we might have two recipes (two lists of ingredients), and ask how many items appear in both lists. One way to do this would be to look at one item in the first list, and then go through each item in the second list to see if it appears. In the worst-case scenario, we would have to go through the entire second list for each item in the first list. As the number of items in either list grows larger, the number of elements we check grows very quickly—for two lists of n elements each, it turns out we have to do n² checks in the situation just described. Even for a computer, this can start to be costly in terms of time as n grows very large.

Measuring Efficiency

The way that we measure the efficiency of a program is using a metric called big-O notation. All we really care about is how quickly the time or memory required (in the worst-case scenario) grows as a function of the number of input elements. Usually “n” is used for the input, but occasionally there will be more than one variable or a constant included.

In our first ingredient example, where in the worst case we had to go through the entire list one time, we would say that the time required was O(n) = n. Imagine an example where we would have to go through a list of n elements exactly 2 times. In that case, the time required is O(n) = 2n. However, since all we really care about is how the function O(n) changes as n grows very large, we still call this O(n) = n. The factor of 2 does not matter much—O(n) is still a linear function. There are lots of other possibilities, but for now it’s only important to understand the concept of how to measure efficiency and how it gives us a way to compare different algorithms to one another.

Sample Problem

Oftentimes, we can write our instructions for the computer (the program) in clever ways that make them more efficient. This might involve taking advantage of a property of the input, or thinking carefully about what exactly we are trying to find and return. Imagine a new problem: you are given a list of numbers that are sorted in increasing order. You have to find whether a target number appears in the list. For instance, does 7 appear in this list?

[1, 3, 4, 5, 6, 7, 8, 19, 223]

It’s not hard to tell that 7 appears by reading quickly from the start, but now imagine if the list were 100,000 elements long. In that case, you would probably not go one-by-one from the beginning—how would you do it instead? (And how many elements do you think you’d have to check in the worst case?)

You can take advantage of the fact that the list is sorted! You could look at an element in the middle of the list. If the target you are looking for is larger than that, you know that it must be to the right (if it exists). With just one check, you have already eliminated 50,000 elements from consideration. Now you have halved the size of the list, and you can use the same technique—treating the remaining 50,000 elements as a new list, start in the middle. If the target is larger than this number, continue by searching the right half of the list. If the target is smaller, search the left half of the list. Of course, if at any point you stumble upon the target number, then you have found it and you know it exists. If you eventually reduce the search space to a final element that is not equal to the target, then you can conclude that the list does not contain the target!

For a list of 100,000 elements, our initial naive solution would require us to check 100,000 elements in the worst case. With our cleverer solution, we only have to check 17 elements in the worst case! (2¹⁷ = 131,072)

Let’s try to write our cleverer solution in pseudo code.

While list contains more than 1 element:
    pointer = middle of list
    current element = list[pointer]

    If current element == target:
        Return true

    Else if element < target:
        List = right half of list, go back to start of while loop

Else if element > target:
    List = left half of list, go back to start of while loop

If only item in list is target:
    Return true
Else:
    Return false

What do you think the time complexity of this program is? (In other words, given a list of n elements, how many elements in the list do we have to check in the worst case, as a function of n?)

The answer is that it is logarithmic. (We write O(n) = log(n); we don’t really care that it is log base 2.)

Another way to think about this is to consider how many more elements in the search space we could accommodate with just one more check. We saw that it would only take 17 checks to find whether a target number exists in a list of 100,000 elements. That means that with 18 checks we could find a target in a list of more than 200,000 elements. With 19 checks we could find a target in a list of more than 400,000 elements, and so on. In other words, using x number of checks, we can effectively search through a list of size 2^x, because the search space is halved for each check.

So, we could write

2^x = n, or

x = log₂(n)

This is another way to see that O(n) = log(n)

This type of strategy is called a binary search, because at each step we are checking whether the target is larger or smaller than the current element, and it is an example of a technique we can use to make programs that are more efficient.

Conclusion

The good news is that lots of people have already come up with really clever ways to solve most of these kinds of problems. You might be reasonably wondering if you really need to learn algorithms. The answer is that you probably will never need to reinvent the wheel and come up with an optimal way to do simple tasks like searching or sorting. However, I think there are a few really compelling reasons to learn about basic algorithms:

For one, it helps you understand how a computer program works. Algorithms have to be exhaustive and account for every possible case. Potentially unexpected or problematic “edge cases” crop up in many surprising ways in different programs, and it’s important to understand what they are and how to anticipate them. Learning algorithms and practicing solving problems also gives you practical coding experience, because you can start with simple problems that can be solved with only the most basic coding commands and tools. It’s a great way to get started writing your own small programs. Once you’ve learned a little bit and written your own code, it becomes much easier to read other code.

Say you want to do a textual analysis of a book you have on your computer. You can easily google this and find a program someone has already written. If you know how to read the code, it will be very easy to use someone else’s code and quickly debug it if problems arise. If you don’t have any experience reading code, you might well struggle for hours to solve a problem that could be something as simple as missing parentheses! Learning to read and understand code—even on a basic level—opens up a whole world of possibilities!

Are you interested in learning more about algorithms or getting into coding? Here are some resources I recommend:

For algorithms and python, register to attend this semester’s Think Like a Coder workshop, or come to the Graduate Center’s Python User Group (PUG) meetings.

For practicing algorithms, check out leetcode.com. You can create a free account and solve algorithmic problems in your browser. There are thousands of problems on the site, ranging from very simple to incredibly complex. They also have some free learning resources available.

To learn to code, a great resource is https://www.freecodecamp.org/learn.

This entry is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.