Let's assume the co-occurrence matrix is given as a list of lists:
com = [[5, 1, 0],
[1, 3, 2],
[0, 2, 3]]
n_elem = len(com)
The Jaccard similarity of two sets A and B is given by |A ∩ B| / |A ∪ B|. The co-occurrence matrix gives the value of |A|, |B|, and |A ∩ B|. The value of |A ∪ B| is simply |A| + |B| - |A ∩ B|, which we can find the Jaccard index.
First, let's create a list of lists containing ones that is the same size as com. The default value is 1 because the similarity index of a set with itself is 1, and we will not calculate these elements:
similarity = [[1 for _ in row] for row in com]
Now, we can loop over each pair of values in com and calculate the similarities. The inner loop starts at i+1 because similarity[i][j] is identical to similarity[j][i], so we only need to calculate the upper triangle of the matrix:
for i in range(n_elem):
a = com[i][i] # |A|
for j in range(i+1, n_elem):
b = com[j][j] # |B|
aib = com[i][j] # |A ∩ B|
aub = a + b - aib # |A ∪ B|
# Set both off-diagonal elements simultaneously
similarity[i][j] = similarity[j][i] = aib / aub
This leaves us with the following similarity matrix:
[[1 , 0.14285714285714285, 0.0],
[0.14285714285714285, 1 , 0.5],
[0.0 , 0.5 , 1]]
Now, if your co-occurrence matrix is a numpy array (or you're open to using numpy), you can speed up this computation by outsourcing the loops to numpy's C backend.
import numpy as np
com_arr = np.array([[5, 1, 0],
[1, 3, 2],
[0, 2, 3]])
n_elem = com_arr.size
First, we can get the occurrence of each element using the diagonal of the matrix:
occ = np.diag(com_arr) # array([5, 3, 3])
Next, create the matrix of |A ∪ B|. Remember that |A ∩ B| is already specified by com_arr:
aub = occ[:, None] + occ[None, :] - com_arr
Since occ is a 1-d array, adding a None index will create a 2-d array of one column (a column vector of shape (3, 1)) and one row (a row vector of shape (1, 3)) respectively. When adding a row vector to a column vector, numpy automatically broadcasts the dimensions so that you end up with a (in this case) square matrix of shape (3, 3). Now, aub looks like this:
array([[5, 7, 8],
[7, 3, 4],
[8, 4, 3]])
Finally, divide the intersection by the union:
similarity = com_arr / aub
et voila, we have the same values as before:
array([[1. , 0.14285714, 0. ],
[0.14285714, 1. , 0.5 ],
[0. , 0.5 , 1. ]])