I want to visualize similarity of text documents for which I am using scikit-learn's TfidfVectorizer as tfidf = TfidfVectorizer(decode_error='ignore', max_df=3).fit_transform(data)
and then performing cosine similarity calculation as cosine_similarity = (tfidf*tfidf.T).toarray()
which gives similarity but sklearn.manifold.MDS needs a dissimilarity matrix. When I give 1-cosine_similarity, the diagonal values which should be zero, are not zero. They are some small value like 1.12e-9 etc. Two questions:
1) How do I use similarity matrix for MDS or how do I change my similarity matrix to dissimilarity matrix?
2) In MDS, there is an option dissimilarity, the values of which can be 'precomputed' or 'euclidean'. What's the difference between the two because when I give euclidean, the MDS coordinates come to be same regardless of whether I use cosine_similarity or 1-cosine_similarity which looks wrong.
Thanks!