Computing the Greatest Common Divisor

April 30, 2009

Here comes my take at one of the most ubiquitous algorithms you can find googling

The least common multiple of 2 numbers, lcm(a,b), is the smallest number that is divisible by both a and b. The greatest common divisor of 2 numbers, gcd(a,b), is the largest number that divides both a and b. But just how do you go about calculating them? A most important property is that one can be calculated from the other based on the relation lcm(a,b)·gcd(a,b) = a·b, which ends up meaning that in practice the gcd is always calculated first. But lets not get ahead of ourselves…

Brute Force
I will disregard the dumber brute force approaches, such as “to get gcd(a,b), try all numbers down from min(a,b), until you find one that divides both a and b”, but a very resource wasteful implementation could use a sieving approach for the lcm. See, the lcm has to be smaller than a·b, which is definitely a multiple of both a and b. So if we set up an array of zeros of size a·b, we could use sieving to first set all multiples of a to 1, then do a search over all multiples of b to find the first common multiple. This approach will definitely break down for even moderately large a or b, but it is probably as good as most of us would get on our own without wikipedia…

from time import clock

def lcmBruteForce(a, b, verbose = False) :
    t = clock()
    sieve = [0]*a*b
    a,b = max(a,b), min(a,b)
    for j in range(a-1,a*b,a) :
        sieve[j] += 1
    ret = 0
    for j in range(b-1,a*b,b) :
        if sieve[j] == 1 :
            ret = j + 1
    t = clock()-t
    if verbose :
        print "Calculated the lcm of",a,"and",b,"in",t,"sec."
    return ret

Integer Factorization
In my case I believe it was Ernesto, back in 1985 or 1986, who showed me into the secret of calculating the lcm and the gcd from the prime factorization of both numbers:

  • to find the gcd, keep only factors common to both and b, raised to the smaller exponent, and
  • for the lcm, keep all factors, common or not, raised to the largest exponent.

This approach will prove to be very inefficient, but lets give it a go. We will reuse the factorAndSieve function from the post on integer factorization. I have also written an auxiliary function, listToDict, that turns the list of factors with repeating entries, into a dictionary, where keys are factors and corresponding entries are exponents:

import time

def list_to_dict(seq) :
    ret = dict()
    for j in seq :
        if j in ret :
            ret[j] += 1
        else :
            ret[j] = 1
    return ret

def gcd_factor(a, b, verbose = False) :
    t = time.clock()
    aa = list_to_dict(factorAndSieve(a))
    bb = list_to_dict(factorAndSieve(b))
    rr = dict()
    for j in bb :
        if j in aa :
            rr[j] = min(aa[j], bb[j])
    ret = 1
    for j in rr :
        ret *= j**rr[j]
    t = time.clock() - t
    if verbose :
        print "Calculated the gcd of",a,"and",b,"in",t,"sec."
    return ret

The Euclidean Algorithm
When we looked at the calculation of the modular multiplicative inverse, we already took a look at the extended Euclidean algorithm, and pointed out it gives the gcd of the two passed numbers as a by-product. Not that it was an interesting thing then, since when calculating multiplicative inverses, we normally are interested in relatively prime number pairs, i.e. gcd(a,b) = 1. But if the adjective extended is dropped from the name, we are left with the Euclidean algorithm, which is the standard to calculate the gcd.

The basic idea behind the algorithm is the observation that if a > b, and g is the gcd of a and b, then it is also the gcd of a – b and b. To make it more efficient, rather than subtracting, we can cut corners by using the equivalent fact that, if g is the gcd of a and b, then it is also the gcd of b and a mod b. If we perform that same operation recursively, after a finite number of iterations, the result of the modular operation will eventually reach zero, and the gcd then is equal to the last non-zero remainder. Or

import time

def gcd_euclid(a, b, verbose = False) :
    t = time.clock()
    aa, bb = a,b
    while b != 0 :
        a, b = b, a%b
    t = time.clock() - t
    if verbose :
        print "Calculated the gcd of",aa,"and",bb,"in",t,"sec."
    return a

The Binary GCD Algorithm
Euclid’s algorithm is so fast that there really is not much point in trying to improve it. If we were using C rather than python, we could have taken advantage of the fact that a computer that does all its arithmetic in binary, and use the binary GCD algorithm. In python the resulting code is less clear, and the run times are slightly slower, probably owing to the fact that bitwise operations “don’t get optimized down to single CPU instructions (…) like in C,” or so says a commenter here… The truth is I haven’t been able to implement the algorithm so that it runs faster than the standard Euclidean one, so I’ll spare you my code…