I have a dataframe df that has the columns id, text, lang, stemmed, and tfidfresult. df has 24 rows. I found the dissimilarity matrix (distance matrix) based on the tfidf result which gives how dissimilar two rows in the dataframe are.
A sample of how the dataframe looks is:
   id     text                lang                    stemmed                  tf_idfresult
0 234  Hi this                  en [hi, this]                   [0.0, 0.2]
1 232  elephants ruined again   en [elephants, ruined, again]   [0.1, 0.0, 0.0]
2 441  there are palm trees     en [there, are, palm, trees]    [0.2, 0.54, 0.0, 0.823]
3 235  so much to do            en [so, much, to, do]           [0.1, 0.1, 0.0, 0.0]
The dissimilarity matrix dis was found with the help of the cosine_similarity function and looks as
[[0.0, 0.3, 0.1, 1, 1...]
[0.1, ...]
.
.
for 24 rows and 24 columns.
I used silhouette method and found the best value for k which is 3. I tried doing
pam = kmedoids(dis, initialmedoids)
but I don't know how to find the initial medoids. The expected output is the dataframe in three clusters. I don't have any specific format for the output.
 
    