I have two documents, for example:
Doc1 = {'python','numpy','machine learning'}
Doc2 = {'python','pandas','tensorflow','svm','regression','R'}
And I also know the similarity(correlation) of each pair of words, e.g
Sim('python','python') = 1
Sim('python','pandas') = 0.8
Sim('numpy', 'R') = 0.1
What is the best way to measure the similarity of the two documents?
It seems that the traditional Jaccard distance and cosine distance are not a good metric in this situation.