A prime number is a natural number larger than 1 which is only divisible by 1 and itself. Simple, isn’t it? Euclid already proved quite a while ago that there are infinitely many of them, so lets try and write a function that returns all primes below a given number, using the most straightforward approach. What would that be? Well, there is nothing less sophisticated than to try and divide each number by every integer larger than 1 and smaller than the number itself, and if we never get a zero remainder then the number is prime. Putting it in python:
from time import time def primeList1(n, verbose = True) : t = time() ret =  for j in range(2,n+1) : jIsPrime = True for k in range(2,j) : if j % k == 0 : jIsPrime = False break if jIsPrime : ret += [j] if verbose : print "Calculated primes to",n,"in",time()-t,"sec." return ret<br /></pre><br />
Do notice the feeble attempt at code optimization, with the
break statement that ends the loop as soon as a divisor is found… If you try it out, it will work fine. Use it to calculate all prime numbers below 1,000, and it will probably give you an instant answer. 10,000 delays a second or two, nothing too worrying. If you try it with 100,000, you’ll have time to take your coffee break before it finishes, and adding any more zeroes will require leaving your script running overnight… Not too surprising really, since to determine that, for example, 9973 is indeed a prime number, our program is having to do trial division by every number between 2 and 9972, both inclusive. All in all, the time required grows with n2.
There are several very simple improvements that can speed things up dramatically, and which require a little, but just a little, mathematical insight…
First, we don’t need to test every number between 2 and 9972 to determine that 9973 is prime; we can stop after the largest integer smaller than the square-root of 9973, i.e. 99. Why? Well, if a number is divisible by a number larger than its square-root, it will also be divisible by the quotient of that division, which is necesaarily smaller than the square-root. And since we have already tried (and failed) to divide by those smaller numbers, there is no need to keep going…
The resulting code is very similar to the previous one:
from time import time from math import sqrt def primeList2(n, verbose = True) : t = time() ret =  for j in range(2,n+1) : jIsPrime = True kMax = int(sqrt(j)) for k in range(2,kMax+1) : if j % k == 0 : jIsPrime = False break if jIsPrime : ret += [j] if verbose : print "Calculated primes to",n,"in",time()-t,"sec." return ret
Do try it, and feel the thrill of the new speed of the algorithm. If you are after all primes below 10,000, this new function will probably get them for you… a hundred times faster!!! If you try it for larger numbers, you will find even better speed increases, because the time required by our algorithm now grows with n1.4.
But we are still doing a lot of superfluous checking. Think again of 9973. Once we have tried to divide it by 2 and failed, there is no point in checking with 4, 6, 8 or any other multiple of 2. So yes, we could double the speed of our algorithm by checking only odd numbers, but lets spend a little more time thinking before going back to coding… Because, after failing with 2, once we turn our attention to 3 and again fail, what’s the point in trying 6, 9, 12 or any other multiple of 3? If you keep the reasoning going, it becomes obvious that you only need to do trial division by prime numbers, since by the time we test whether a composite number divides 9973, we have already tested all its prime factors, and if those have failed, this one will as well.
Now wait a minute! Isn’t finding whether a number is prime or not precisely what we are after? How are we going to figure out what numbers to do trial division with? Well, we already saw in our first optimization that our division trials only need to go up to the square root of the number being tested, so we only need to know the first prime number, 2, and afterwards we can use the list of prime numbers that we are building to generate the additional items… The code to do that would look something like this:
from time import time from math import sqrt def primeList3(n, verbose = True) : t = time() ret =  for j in range(2,n+1) : jIsPrime = True kMax = sqrt(j) for k in ret : if k > kMax : break if j % k == 0 : jIsPrime = False break if jIsPrime : ret += [j] if verbose : print "Calculated primes to",n,"in",time()-t,"sec." return ret
If you test this code you’ll surely notice the new speed increase. This last optimization doesn’t make it a hundred times faster, but on my computer the new code is 3.5 faster than the previous one when calculating all primes below 100,000 and 5.5 times faster when calculating all primes below 1,000,000. And again, what really makes the new code more efficient is not that 4x increase in speed, but the fact that, the larger our input number gets, the better the new code is. It seems that now time required grows with n1.25…
This last version holds the fundamental ideas of really fast code, i.e. skipping multiples of primes, but still relies on the slow and time consuming operation of trial division. Getting rid of it is the next optimization, which we will see in the next post.