I have a text file and would like to create semantic vectors for each word in the file. I would then like to extract the cosine similarity for about 500 pairs of words. What is the best package in R for doing this?
- 
                    Are you trying to create cosine similarity based on word representation (based on one hot encoding) or use Word2Vec or some other word embeddings? – won782 Jan 17 '18 at 23:48
 - 
                    Based on word representation. What are the advantages/disadvantages of using Word2Vec? – Namenlos Jan 17 '18 at 23:51
 - 
                    It really depends on what your task is and your data. Say if you have smaller maybe few paragraphs of data; then the matrix is sparse and could nearly be singular. Advantage of Word2Vec is using pre-learned word embeddings in n-dimensional space. Cosine similarity on Word2Vec space works miraculously in most cases. – won782 Jan 17 '18 at 23:54
 
2 Answers
If I understand your problem correctly, you want the cosine similarity of two vectors of words. Let us start with the cosine similiarity of two words only:
library(stringdist)
d <- stringdist("ca","abc",method="cosine")
The result is d= 0.1835034 as expected.
There is also a function stringdistmatrix() contained in that package which calculates the distance between all pairs of strings:
> d <- stringdistmatrix(c('foo','bar','boo','baz'))
> d
  1 2 3
2 3    
3 1 2  
4 3 1 2
For your purpose, you can simply use something like this
stringdist(c("ca","abc"),c("aa","abc"),method="cosine")
The result are the measure for the distances between ca and aa on the one hand and abc compared with abc on the other hand:
0.2928932 0.0000000
Disclaimer: The library stringdist is brand new (June 2019), but seems to work nicely. I am not associated with the authors of the library.
- 5,578
 - 10
 - 38
 - 89
 
You can use lsa library. cosine function of the library gives a matrix of cosine similarity. It takes a matrix as input.
- 596
 - 1
 - 6
 - 17
 
- 
                    1error while inserting **strings** into the `cosine()` function. It requires _numeric/complex matrix/vector arguments_ as an input. – Abhishek Puri Jun 20 '18 at 06:34
 - 
                    For a full working example, see e.g. https://stackoverflow.com/questions/34045738 – B--rian Jun 24 '19 at 10:39