Below you can find a thorough explanation of the (hardest) »GCD Counting« problem of recent »Educational Codeforces Round 45«, as well as a Python solution. I found this problem extraordinarily neat and with the (official editorial) solution, one can learn a nice number theoretic trick:

Suppose we are given a set S and an arithmetic function , where our goal is to determine the cardinality for various . It may be much more efficient to compute cardinalities first and then construct from these what we need.

Let me now explain this cryptic statement in the setting of our problem. The Codeforces problem goes:

Suppose we are given a tree with vertices , where the node is labelled with an integer (). Let be the number of non-empty simple paths in the tree such that . Find for all .

As the first grey paragraph suggests, we may take interest in a “weaker” quantity. Let be the number of non-empty simple paths in the tree such that divides . We will see that computing is quite easy and we may construct the values of from the values of . Indeed, letting denote the set of primes between 1 and , we can by simple counting (the inclusion-exclusion principle) deduce that:

Note that this is just formalized reasoning of: “To count the paths with gcd=n, we take paths with ngcd, take away all paths with gcd divisible by , then due to overcounting take back all paths with gcd divisible by , “. Note also that in the above sum, of any value greater than will clearly be zero. The sum of this form can be by an experienced eye turned into a compter-scientifically nicer form using the Möbius function. Recall its definition:

In particular, . We may also use, very conveniently, to “disappear” exactly the terms which we don’t have in the above summation:

So how do we compute the values ? To find , consider the subgraph of where we retain only the vertices where is divisible by . It is not hard to see that for any path, its gcd is divisible by if and only if all of its vertices lie in . Thus we have:

To detect the connected components of , we may just use the breadth-first search. Putting the solution together in Python (This Python solution does not pass the time limits to be accepted by the Codeforces checker, even though any operations were converted to be array-based. It seems that Python is just too slow for some tasks. Anyhow, if you are after passing the time limit, rewriting the below code into a faster language will give you a pass):

import sys
lim = 2*10**5

# precalculate primes
is_prime = [False, False] + [True for _ in range(lim+1)]
for i in range(2, int((lim+1) ** 0.5) + 1):
    if is_prime[i]:
        for j in range(i, (lim+1) // i + 1):
            is_prime[i * j] = False

# precalculate mobius function
mu = [1 for _ in range(lim+1)]
for p, prime in enumerate(is_prime):
    if prime:
        for j in range(p, lim+1, p):
            mu[j] *= -1
        for j in range(p*p, lim+1, p*p):
            mu[j] = 0

# read input
N, a = int(sys.stdin.readline()), list(map(int,  sys.stdin.readline().split()))
tree = [[] for _ in range(N)]
for i in range(N-1):
    x, y = map(int, sys.stdin.readline().split())

# inv_a[n] = list of vertices i such that a_i = n
inv_a = [[] for _ in range(lim + 1)]
for i, v in enumerate(a):

# helper arrays
visited_by = [0 for _ in range(lim + 1)] # will keep track of visits in bfs
good = [0 for _ in range(lim + 1)] # avoids the need to use (slow) % operation
stack = [0 for _ in range(lim + 1)] # will serve as stack in array-based bfs

h = [0 for _ in range(lim + 1)]
for n in range(1, lim+1):
    for j in range(n, lim+1, n):
        good[j] = n
    for j in range(n, lim+1, n): # j = multiples of n
        for vertex in inv_a[j]:
            if visited_by[vertex] < n: # bfs
                head, tail, component_size = 0, 1, 0
                stack[0] = vertex
                while head < tail:
                    root = stack[head]
                    head += 1
                    visited_by[root] = n
                    component_size += 1
                    for ne in tree[root]:
                        if visited_by[ne] < n and good[a[ne]] == n:
                            stack[tail] = ne
                            tail += 1
                h[n] += component_size * (component_size + 1) // 2

for n in range(1, lim+1):
    g_n = sum(mu[i] * h[n*i] for i in range(1, int(lim/n) + 1))
    if g_n:
        sys.stdout.write('{:d} {:d}\n'.format(n, g_n))

How efficient is the above solution? Consider the worst case scenario when . The precalculation of primes with the Sieve of Eratosthenes takes time, sieving the Möbius function takes time. The main bottleneck becomes the calculation of . Let’s first analyze how many operations (cummulatively) will be invoked by the lines that come after “for vertex in inv_a[j]:”. Vertex appears in inv_a times (the number of divisors of ), and hence, while performing the bfs’s, we will visit each vertex and all of its neighbours times. Altogether, this will give us operations. We may upper-bound this further considering and , resulting in bound . Next, anything that comes before “for vertex in inv_a[j]:” will cummulatively result in operations. The final few lines computing all values will again result in operations. The overall complexity is .

As the last remark, we note that in case , we have .