I want to compute the pairwise distances of 57832 vectors. Each vector has 200 dimensions. I am using pdist to compute the distances.
from scipy.spatial.distance import pdist
pairwise_distances = pdist(X, 'cosine')
# pdist is supposed to return a numpy array with shape (57832*57831,).
However, this causes a memory error.
   Traceback (most recent call last):
  File "/home/munichong/git/DomainClassification/NameSuggestion@Verisign/classification_DMOZ/main.py", line 101, in <module>
    result_clustering = clf_clustering.getCVResult(shuffle)
  File "/home/munichong/git/DomainClassification/NameSuggestion@Verisign/classification_DMOZ/ClusteringBasedClassification.py", line 158, in getCVResult
    self.centroids_of_categories(X_train, y_train)
  File "/home/munichong/git/DomainClassification/NameSuggestion@Verisign/classification_DMOZ/ClusteringBasedClassification.py", line 103, in centroids_of_categories
    cat_centroids.append( self.dpc.centroids(X_in_this_cat) )
  File "/home/munichong/git/DomainClassification/NameSuggestion@Verisign/classification_DMOZ/ClusteringBasedClassification.py", line 23, in centroids
    distance_dict, rho_dict = self.compute_distances_and_rhos(X)
  File "/home/munichong/git/DomainClassification/NameSuggestion@Verisign/classification_DMOZ/ClusteringBasedClassification.py", line 59, in compute_distances_and_rhos
    pairwise_distances = pdist(X, 'cosine')
  File "/usr/local/lib/python2.7/dist-packages/scipy/spatial/distance.py", line 1185, in pdist
    dm = np.zeros((m * (m - 1)) // 2, dtype=np.double)
MemoryError
The RAM of my laptop is 16GB. How should I fix it? Or is there any better way?
 
    