When I wrote the first post of the FFT series I confessed it was out of boredom from number theory. That statement got me into a coffee break discussion with a friend and mathematician, who claimed that the FFT is as number theory as it gets, and went on to explain why. Since I usually don’t understand a word of what he is saying when he gets started on one of this rants, which invariably lead to Riemann’s zeta function, I just pretended to acknowledge his doubts, and then forgot about the conversation altogether.

Well, a few weeks down the road it seems he was right: while trying to make my FFT code work, I have found myself dealing with all sorts of number theoretical stuff, ranging from simple ones, like factorization, to arcane, as the Chinese remainder theorem. And since I have this thing about complete expositions, I’m afraid that this interleaving of apparently unrelated number theory posts with the FFT ones is going to be the norm. As a matter of fact the odd post on the Euclidean algorithm was supposed to be a stepping stone to an improved radix 2 FFT, which I finally managed to do without.

But lets get going with today’s subject: computing the divisors of a positive integer. May not seem a very sexy topic at first glance, but there’s beauty in it, as the striking Divisor Plot website put together by Jeffrey Ventrella proves…

Getting into the algorithmic treatment of the problem, there is of course the dumber approach, on the line of “to compute all divisors of a number, do trial division by all numbers smaller than it.” I will not pursue that line of thinking though, and will focus on computing the divisors once the prime factorization of the number is known, which is the efficient way of dealing with this problem.

So I will assume that we know the expansion of the number into its prime factors as

,

where the function ω(n) represents the number of distinct prime factors, p_{i}, each appearing with the corresponding exponent, a_{i}.

A divisor will have a subset of the prime factors of n as its own factors, so we could represent any divisor as a list of ω(n) numbers, the i-th ranging from 0 up to and including a_{i}. This scheme makes it very easy to calculate the total number of divisors as

.

**Unordered Generation of Divisors**

Furthermore, it points out that the computation of all the divisors could be achieved using ω(n) nested loops. Of course the number of loops to nest will vary from input to input, so recursion is probably the best of choices, or so thinks Tim Peters, as the code below has only cosmetic modifications to what he posted on the Python-list@python.org mailing list:

def divisors(factors) : """ Generates all divisors, unordered, from the prime factorization. """ ps = sorted(set(factors)) omega = len(ps) def rec_gen(n = 0) : if n == omega : yield 1 else : pows = [1] for j in xrange(factors.count(ps[n])) : pows += [pows[-1] * ps[n]] for q in rec_gen(n + 1) : for p in pows : yield p * q for p in rec_gen() : yield p

This approach, while extremely fast, is not much use if you want to produce the divisors sorted, as you will then have to first generate and store all divisors, then sort them.

**Ordered Generation of Divisors**

If what you are after is not the full set of divisors, or any arbitrary subset of them, but something like “the first divisor larger than x” or “the last divisor smaller than x,” it may be worth trying other algorithms which, while being less efficient, manage to produce the divisors properly sorted.

There’s little literature on the subject, so its hard to find a valid benchmark . Even PARI/GP, which is the golden standard on computational number theory seems, from the little sense I have managed to make out of their Subversion repository, to be generating all divisors (in a similar manner to above) and then sorting them.

So here’s a calculation scheme that can be effectively modified to yield ordered divisors… We start with a pivot value of 1, and an ordered list of all prime factors. Before every factor we split our flow in two:

- one branch skips the factor, and proceeds to the first factor different from the current one, keeping the pivot unchanged
- the other branch first updates the pivot, multiplying it by the factor, yields the result, and then proceeds to the next factor, with the modified pivot.

The flow is probably easier to understand with the example below, showing the branching when computing the divisors of 540 = 2^{2}·3^{3}·5. Blue arrows skip to the first different factors, red ones go to the next factors, a line going through a factor implies multiplication by it…

To keep track of this structure, I’m going to get fancy, and will build a tree-like data structure, where each item will be a list, holding the factor in the first position, and pointers, i.e. nested lists, to the next elements in the second and third ones. Again, to make things clear, here’s the data structure for 540, using the same color code as before…

Rather than traversing this structure recursively, we will approach it in a more structured way, by keeping a separate list of factors that can be visited in a next step. In the example above, when we start out, we can visit the first factor (a 2), or skip to the third (the first 3), so the list will hold two elements. If we select going to the first 2, we then have two new options, i.e. going on to the second 2, or taking the shortcut to the first three. Note that the two instances point at the first three are different, since once has a pivot value of 1 (and will thus generate the divisor 3), while the other has it set to 2 (an will generate the divisor 6).

If we don’t care about the other of divisors, this method can even be implemented by hand, by keeping a list of pivots and remaining factors, and on each step erasing the first element of the list and generating a divisor, plus adding up to two new items at the bottom. The first steps, plus the resulting list of divisors, are worked out below, again for the case of n = 540.

Now, while when doing it on paper there is not much more choice other than processing the items sequentially, we could choose to process always the element that produces the smallest divisor, and thus produce them in order. The simplest implementation of this idea would simply use a sort on the list of processable items before yielding the next divisor…

def divisors_ordered(factors) : """ Generates all divisors, ordered, from the prime factorization """ factors = [(j,factors.count(j)) for j in sorted(set(factors))] omega = len(factors) fcts =[factors[0][0],[],[]] aux = [fcts, None] for j in xrange(omega) : p,a = factors[j] if omega - j > 1 : aux[1] = [factors[j+1][0],[],[]] else : aux[1] = None for k in xrange(1, a) : b = [p,[],[]] aux[0][1] = b aux[0][2] = aux[1] aux[0] = b aux[0][1] = aux[1] aux[0][2] = aux[1] aux[0] = aux[1] div_tree = [[fcts[0],1, fcts]] yield 1 while div_tree : div_tree.sort() divisor, pivot, pointer = div_tree.pop(0) yield divisor if pointer[1] : div_tree += [[divisor * pointer[1][0] , divisor, pointer[1]]] if pointer[2] : div_tree += [[pivot * pointer[2][0], pivot, pointer[2]]]

Of course doing a full sort before yielding every divisor kills the efficiency of this algorithm. The fancy (or should I simply say correct?) way of keeping an ordered list, popping the smallest item only, and pushing arbitrary elements, is with a priority queue, which I will implement as a binary heap. Since we will be building our heap from scratch, we only need to worry about push and pop functions.

def push_heap(item, heap = None, compare = cmp) : """ Pushes an item into a min-heap, or creates one. """ heap_size = None if not heap : heap = [item, 1] else : heap_size = heap.pop() heap += [item, heap_size + 1] i = heap[-1] - 1 val = heap[i] heap_ordered = False while i > 0 and not heap_ordered : n_i = i - 1 if n_i & 1 : # faster than n_i % 2 n_i -= 1 n_i >>= 1 # faster than n_i // 2 n_val = heap[n_i] if compare(val, n_val) < 0 : heap[i], heap[n_i] = heap[n_i], heap[i] i, val = n_i, heap[n_i] else : heap_ordered = True return heap def pop_heap(heap, compare = cmp) : """ Pops the smallest item from a min-heap. """ heap_size = heap.pop() if not heap_size: return None elif heap_size == 1 : return heap.pop() ret, heap[0] = heap[0], heap.pop() heap_size -= 1 heap += [heap_size] i = 0 heap_ordered = False while not heap_ordered : n_i = 2 * i + 1 if n_i < heap_size : if n_i + 1 < heap_size and compare(heap[n_i],heap[n_i + 1]) > 0 : n_i += 1 if compare(heap[i], heap[n_i]) > 0 : heap[i], heap[n_i] = heap[n_i], heap[i] i = n_i else : heap_ordered = True else : heap_ordered = True return ret def divisors_ordered(factors) : """ Generates an ordered list of all divisors """ factors = [(j,factors.count(j)) for j in sorted(set(factors))] omega = len(factors) fcts =[factors[0][0],[],[]] aux = [fcts, None] for j in xrange(omega) : p,a = factors[j] if omega - j > 1 : aux[1] = [factors[j+1][0],[],[]] else : aux[1] = None for k in xrange(1, a) : b = [p,[],[]] aux[0][1] = b aux[0][2] = aux[1] aux[0] = b aux[0][1] = aux[1] aux[0][2] = aux[1] aux[0] = aux[1] yield 1 div_heap = push_heap([fcts[0],1, fcts]) while div_heap : divisor, pivot, pointer = pop_heap(div_heap) yield divisor if pointer[1] : div_heap = push_heap([divisor * pointer[1][0] , divisor, pointer[1]], div_heap) if pointer[2] : div_heap = push_heap([pivot * pointer[2][0], pivot, pointer[2]], div_heap)

The following graph shows the time required by each of the three implementations presented to produce an ordered list of all divisors of p_{n}#, i.e. the product of the first n prime numbers. Do notice how the lower computational complexity of the heap sorting eventually beats the initial speed advantage of the sorting version. Still, for full lists of divisors, generating all of them and then sorting beats any other approach. It is also worth pointing out how the time required for the heap version of the ordered algorithm eventually settles around being 14 times slower than the unordered generator with delayed sorting, i.e. the computational complexities of both are the same, which probably means that sorting dominates the time of the unordered algorithm.

Just to show why we could want to use an algorithm that is more than ten times slower, the following graph shows the time needed to compute the first divisor larger than 1000 for the first few primorials. While using the heap version of the ordered algorithm time is close to constant, the unordered algorithm shows the same behavior as in the previous test.

It is nevertheless worth noting that the break even point is p_{11}# = 200,560,490,130, so unless you are after the first few divisors of a huge number, the first algorithm presented should always be the choice.

In the first graph ‘unordered then sort’ is faster than ‘ordered with heap’. Could that be because Python sort is implemented in C whereas your heap is implemented in Python, so it is telling us that C is faster than Python – which we knew already!

Python supplies a heap / priority queue in the ‘heapq’ module. It is implemented in C and is, I assume, quite fast.

I don’t think so, Steve… The algorithm to generate divisors in a sortable manner is slower than the first one, even if you avoid the sorting altogether. If you extract the potential divisors from the end and not the beginning of the list, it seems to have the same computational complexity as the first one, but runs somewhere between 1.5 and 2 times slower.

Using heapq instead of my own, things improve, specially at the lower runtime end, but overall for large inputs the performance is still 3 to 5 times slower than sorting after generation, not during.

I nevertheless feel that using data structures without building them with your bare hands is close to cheating… 😉

Despite the better performance, it’s funny that the break even point to generate the first divisor larger than 1000 is not altered much: it is now around p9# = 223,092,870, which is still a huge number for the practical purposes I have in mind.

Python supplies a heap / priority queue in the ‘heapq’ module. It is implemented in C and is, I assume, quite fast.Apparently not on all systems. I once read a blog post where someone used the heapq module for sorting, and I couldn’t duplicate his results. I emailed the author, and he discovered that it worked well on one computer of his, but the results were quite different in the other (which was also running a different OS).

Turned out one had the module compiled into C. The other was plain Python.

Beetle B,

I presume it was because the versions of Python were different.

Python 2.3 introduced the new heapq module, implemented in Python.

In Python 2.4 the heapq module was converted to C resulting in a tenfold improvement in speed.

I doubt that was the reason, as we were both using 2.5. I was on Linux, and he was, I believe on a Mac.

[…] repeating) list of primes. If I’d been paying slightly more attention, he’s got a lengthy post about computing the number of divisors for an integer as […]

This is a very useful algorithm, and should come in handy in solving some projecteuler problems. Thanks for the great explanation!

Could you tell us what program you use to draw those beautiful-looking diagrams (of the linked structures)? They look very natural. Or did you draw them by hand?

Is there a mixup in the coloring and labeling of the lines in the two plots, that is, does BLUE (ordered with sort) in the next-to-last become ORANGE (unordered then sort) in the last, while GREEN (ordered with heap) stays the same? I ask this because the blue line in the next-to-last plot and the orange line in the last both have the hockey-stick shape and both cross the green line at about p-sub-11-#. I see that the elapsed times are different in the two problems. Just surprised that orange should have acquired such a pronounced, late upwards break … Your text is consistent with the labeling in the last plot, so I guess if there is a mixup it occurred in the next-to-last plot.

Nice and interesting post! Both recursive and priority queue based version are beutifull. Will definietly use it from time to time 🙂

Hello again.

I rewritten (unsorted version) it into C (and added few other optimalisation). I needed divisors for determining an order of element a in finite field Z/pZ. (order of element a is smallest positive number k, such that a^k mod p == 1, and I know it is true for k==p-1, via Fermat’s theorem, but it is not nacassarly smallest k. so need to check also divisors of p-1, because if a^k == a^(k1*k2) == (a^k1)^k2 == 1^k2 , then we see k1 (a divisor of k), or its divisor, can be an order of a )

You can find it at http://pastebin.com/tMBTYSx7

It is small, fast and interesting code I think.

I will also implement ordered heap to check if it faster, but for my purpose it can be beneficial to also use identity a^(x*y) = (a^x)^y first, as it will be faster to perform powpod probably incrementally, than checking from smallest ones (but early termination in ordered heap will be really nice for me also).