Basically, I want to reimplement this video.
Given a corpus of documents, I want to find the terms that are most similar to each other.
I was able to generate a cooccurrence matrix using this SO thread and use the video to generate an association matrix. Next I, would like to generate a second order cooccurrence matrix.
Problem statement: Consider a matrix where the rows of the matrix correspond to a term and the entries in the rows correspond to the top k terms similar to that term. Say, k = 4, and we have n terms in our dictionary, then the matrix M has n rows and 4 columns.
HAVE:
M = [[18,34,54,65],   # Term IDs similar to Term t_0
     [18,12,54,65],   # Term IDs similar to Term t_1
     ...
     [21,43,55,78]]   # Term IDs similar to Term t_n.
So, M contains for each term ID, the most similar term IDs. Now, I would like to check how many of those similar terms match. In the example of M above, it seems that term t_0 and term t_1 are quite similar, because three out of four terms match, where as terms t_0 and t_nare not similar, because no terms match. Let's write M as a series of lists.
M = [list_0,   # Term IDs similar to Term t_0
     list_1,   # Term IDs similar to Term t_1
     ...
     list_n]   # Term IDs similar to Term t_n.
WANT:
C = [[f(list_0, list_0), f(list_0, list_1), ..., f(list_0, list_n)],
     [f(list_1, list_0), f(list_1, list_1), ..., f(list_1, list_n)],
     ...
     [f(list_n, list_0), f(list_n, list_1), ..., f(list_n, list_n)]]
I'd like to find the matrix C, that has as its elements, a function f applied to the lists of M. f(a,b) measures the degree of similarity between two lists a and b. Going, with the example above, the degree of similarity between t_0 and t_1 should be high, whereas the degree of similarity of t_0 and t_n should be low. 
My questions:
- What is a good choice for comparing the ordering of two lists? That is, what is a good choice for function 
f? - Is there a transformation already available that takes as an input a matrix like 
Mand produces a matrix likeC? Preferably a python package? 
Thank you, r0f1