When I wrote the first post of the FFT series I confessed it was out of boredom from number theory. That statement got me into a coffee break discussion with a friend and mathematician, who claimed that the FFT is as number theory as it gets, and went on to explain why. Since I usually don’t understand a word of what he is saying when he gets started on one of this rants, which invariably lead to Riemann’s zeta function, I just pretended to acknowledge his doubts, and then forgot about the conversation altogether.

Well, a few weeks down the road it seems he was right: while trying to make my FFT code work, I have found myself dealing with all sorts of number theoretical stuff, ranging from simple ones, like factorization, to arcane, as the Chinese remainder theorem. And since I have this thing about complete expositions, I’m afraid that this interleaving of apparently unrelated number theory posts with the FFT ones is going to be the norm. As a matter of fact the odd post on the Euclidean algorithm was supposed to be a stepping stone to an improved radix 2 FFT, which I finally managed to do without.

But lets get going with today’s subject: computing the divisors of a positive integer. May not seem a very sexy topic at first glance, but there’s beauty in it, as the striking Divisor Plot website put together by Jeffrey Ventrella proves…

Getting into the algorithmic treatment of the problem, there is of course the dumber approach, on the line of “to compute all divisors of a number, do trial division by all numbers smaller than it.” I will not pursue that line of thinking though, and will focus on computing the divisors once the prime factorization of the number is known, which is the efficient way of dealing with this problem.

So I will assume that we know the expansion of the number into its prime factors as

,

where the function ω(n) represents the number of distinct prime factors, p_{i}, each appearing with the corresponding exponent, a_{i}.

A divisor will have a subset of the prime factors of n as its own factors, so we could represent any divisor as a list of ω(n) numbers, the i-th ranging from 0 up to and including a_{i}. This scheme makes it very easy to calculate the total number of divisors as

.

**Unordered Generation of Divisors**

Furthermore, it points out that the computation of all the divisors could be achieved using ω(n) nested loops. Of course the number of loops to nest will vary from input to input, so recursion is probably the best of choices, or so thinks Tim Peters, as the code below has only cosmetic modifications to what he posted on the Python-list@python.org mailing list:

def divisors(factors) : """ Generates all divisors, unordered, from the prime factorization. """ ps = sorted(set(factors)) omega = len(ps) def rec_gen(n = 0) : if n == omega : yield 1 else : pows = [1] for j in xrange(factors.count(ps[n])) : pows += [pows[-1] * ps[n]] for q in rec_gen(n + 1) : for p in pows : yield p * q for p in rec_gen() : yield p

This approach, while extremely fast, is not much use if you want to produce the divisors sorted, as you will then have to first generate and store all divisors, then sort them.

**Ordered Generation of Divisors**

If what you are after is not the full set of divisors, or any arbitrary subset of them, but something like “the first divisor larger than x” or “the last divisor smaller than x,” it may be worth trying other algorithms which, while being less efficient, manage to produce the divisors properly sorted.

There’s little literature on the subject, so its hard to find a valid benchmark . Even PARI/GP, which is the golden standard on computational number theory seems, from the little sense I have managed to make out of their Subversion repository, to be generating all divisors (in a similar manner to above) and then sorting them.

So here’s a calculation scheme that can be effectively modified to yield ordered divisors… We start with a pivot value of 1, and an ordered list of all prime factors. Before every factor we split our flow in two:

- one branch skips the factor, and proceeds to the first factor different from the current one, keeping the pivot unchanged
- the other branch first updates the pivot, multiplying it by the factor, yields the result, and then proceeds to the next factor, with the modified pivot.

The flow is probably easier to understand with the example below, showing the branching when computing the divisors of 540 = 2^{2}·3^{3}·5. Blue arrows skip to the first different factors, red ones go to the next factors, a line going through a factor implies multiplication by it…

To keep track of this structure, I’m going to get fancy, and will build a tree-like data structure, where each item will be a list, holding the factor in the first position, and pointers, i.e. nested lists, to the next elements in the second and third ones. Again, to make things clear, here’s the data structure for 540, using the same color code as before…

Rather than traversing this structure recursively, we will approach it in a more structured way, by keeping a separate list of factors that can be visited in a next step. In the example above, when we start out, we can visit the first factor (a 2), or skip to the third (the first 3), so the list will hold two elements. If we select going to the first 2, we then have two new options, i.e. going on to the second 2, or taking the shortcut to the first three. Note that the two instances point at the first three are different, since once has a pivot value of 1 (and will thus generate the divisor 3), while the other has it set to 2 (an will generate the divisor 6).

If we don’t care about the other of divisors, this method can even be implemented by hand, by keeping a list of pivots and remaining factors, and on each step erasing the first element of the list and generating a divisor, plus adding up to two new items at the bottom. The first steps, plus the resulting list of divisors, are worked out below, again for the case of n = 540.

Now, while when doing it on paper there is not much more choice other than processing the items sequentially, we could choose to process always the element that produces the smallest divisor, and thus produce them in order. The simplest implementation of this idea would simply use a sort on the list of processable items before yielding the next divisor…

def divisors_ordered(factors) : """ Generates all divisors, ordered, from the prime factorization """ factors = [(j,factors.count(j)) for j in sorted(set(factors))] omega = len(factors) fcts =[factors[0][0],[],[]] aux = [fcts, None] for j in xrange(omega) : p,a = factors[j] if omega - j > 1 : aux[1] = [factors[j+1][0],[],[]] else : aux[1] = None for k in xrange(1, a) : b = [p,[],[]] aux[0][1] = b aux[0][2] = aux[1] aux[0] = b aux[0][1] = aux[1] aux[0][2] = aux[1] aux[0] = aux[1] div_tree = [[fcts[0],1, fcts]] yield 1 while div_tree : div_tree.sort() divisor, pivot, pointer = div_tree.pop(0) yield divisor if pointer[1] : div_tree += [[divisor * pointer[1][0] , divisor, pointer[1]]] if pointer[2] : div_tree += [[pivot * pointer[2][0], pivot, pointer[2]]]

Of course doing a full sort before yielding every divisor kills the efficiency of this algorithm. The fancy (or should I simply say correct?) way of keeping an ordered list, popping the smallest item only, and pushing arbitrary elements, is with a priority queue, which I will implement as a binary heap. Since we will be building our heap from scratch, we only need to worry about push and pop functions.

def push_heap(item, heap = None, compare = cmp) : """ Pushes an item into a min-heap, or creates one. """ heap_size = None if not heap : heap = [item, 1] else : heap_size = heap.pop() heap += [item, heap_size + 1] i = heap[-1] - 1 val = heap[i] heap_ordered = False while i > 0 and not heap_ordered : n_i = i - 1 if n_i & 1 : # faster than n_i % 2 n_i -= 1 n_i >>= 1 # faster than n_i // 2 n_val = heap[n_i] if compare(val, n_val) < 0 : heap[i], heap[n_i] = heap[n_i], heap[i] i, val = n_i, heap[n_i] else : heap_ordered = True return heap def pop_heap(heap, compare = cmp) : """ Pops the smallest item from a min-heap. """ heap_size = heap.pop() if not heap_size: return None elif heap_size == 1 : return heap.pop() ret, heap[0] = heap[0], heap.pop() heap_size -= 1 heap += [heap_size] i = 0 heap_ordered = False while not heap_ordered : n_i = 2 * i + 1 if n_i < heap_size : if n_i + 1 < heap_size and compare(heap[n_i],heap[n_i + 1]) > 0 : n_i += 1 if compare(heap[i], heap[n_i]) > 0 : heap[i], heap[n_i] = heap[n_i], heap[i] i = n_i else : heap_ordered = True else : heap_ordered = True return ret def divisors_ordered(factors) : """ Generates an ordered list of all divisors """ factors = [(j,factors.count(j)) for j in sorted(set(factors))] omega = len(factors) fcts =[factors[0][0],[],[]] aux = [fcts, None] for j in xrange(omega) : p,a = factors[j] if omega - j > 1 : aux[1] = [factors[j+1][0],[],[]] else : aux[1] = None for k in xrange(1, a) : b = [p,[],[]] aux[0][1] = b aux[0][2] = aux[1] aux[0] = b aux[0][1] = aux[1] aux[0][2] = aux[1] aux[0] = aux[1] yield 1 div_heap = push_heap([fcts[0],1, fcts]) while div_heap : divisor, pivot, pointer = pop_heap(div_heap) yield divisor if pointer[1] : div_heap = push_heap([divisor * pointer[1][0] , divisor, pointer[1]], div_heap) if pointer[2] : div_heap = push_heap([pivot * pointer[2][0], pivot, pointer[2]], div_heap)

The following graph shows the time required by each of the three implementations presented to produce an ordered list of all divisors of p_{n}#, i.e. the product of the first n prime numbers. Do notice how the lower computational complexity of the heap sorting eventually beats the initial speed advantage of the sorting version. Still, for full lists of divisors, generating all of them and then sorting beats any other approach. It is also worth pointing out how the time required for the heap version of the ordered algorithm eventually settles around being 14 times slower than the unordered generator with delayed sorting, i.e. the computational complexities of both are the same, which probably means that sorting dominates the time of the unordered algorithm.

Just to show why we could want to use an algorithm that is more than ten times slower, the following graph shows the time needed to compute the first divisor larger than 1000 for the first few primorials. While using the heap version of the ordered algorithm time is close to constant, the unordered algorithm shows the same behavior as in the previous test.

It is nevertheless worth noting that the break even point is p_{11}# = 200,560,490,130, so unless you are after the first few divisors of a huge number, the first algorithm presented should always be the choice.