I'm working on an NLP task and I need to calculate the co-occurrence matrix over documents. The basic formulation is as below:
Here I have a matrix with shape (n, length), where each row represents a sentence composed by length words. So there are n sentences with same length in all. Then with a defined context size, e.g., window_size = 5, I want to calculate the co-occurrence matrix D, where the entry in the cth row and wth column is #(w,c), which means the number of times that a context word c appears in w's context.
An example can be referred here. How to calculate the co-occurrence between two words in a window of text?
I know it can be calculate by stacking loops, but I want to know if there exits an simple way or simple function? I have find some answers but they cannot work with a window sliding through the sentence. For example:word-word co-occurrence matrix
So could anyone tell me is there any function in Python can deal with this problem concisely? Cause I think this task is quite common in NLP things.