I have an numpy array of size
arr.size = (200, 600, 20). 
I want to compute scipy.stats.kendalltau on every pairwise combination of the last two dimensions. For example:
kendalltau(arr[:, 0, 0], arr[:, 1, 0])
kendalltau(arr[:, 0, 0], arr[:, 1, 1])
kendalltau(arr[:, 0, 0], arr[:, 1, 2])
...
kendalltau(arr[:, 0, 0], arr[:, 2, 0])
kendalltau(arr[:, 0, 0], arr[:, 2, 1])
kendalltau(arr[:, 0, 0], arr[:, 2, 2])
...
...
kendalltau(arr[:, 598, 20], arr[:, 599, 20])
such that I cover all combinations of arr[:, i, xi] with arr[:, j, xj] with i < j and xi in [0,20), xj in [0, 20). This is (600 choose 2) * 400 individual calculations, but since each takes about 0.002 s on my machine, it shouldn't take much longer than a day with the multiprocessing module. 
What's the best way to go about iterating over these columns (with i<j)? I figure I should avoid something like
for i in range(600):
    for j in range(i+1, 600):
        for xi in range(20):
            for xj in range(20):
What is the most numpythonic way of doing this?
Edit: I changed the title since Kendall Tau isn't really important to the question. I realize I could also do something like
import itertools as it
for i, j in it.combinations(xrange(600), 2):
    for xi, xj in product(xrange(20), xrange(20)):
but there's got to be a better, more vectorized way with numpy.