As far as I know, pre-trained models play well in many tasks as a feature-extractor, thanks to their abundant training dataset.
However, I'm wondering that whether the model, let's say vgg-16,
have certain ability to extract some "semantic" information from input image?
If the answer is positive, given an unlabeled dataset,
is it possible to "cluster" images by measuring the semantic similarities of the extracted features?
Actually, I've spent some efforts:
- Load pre-trained vgg-16 through Pytorch.
- Load Cifar-10 dataset and transform to batched-tensor
X, of size(5000, 3, 224, 224). - Fine-tune vgg.classifier, define its output dimension as 4096.
- Extract features:
features = vgg.features(X).view(X.shape[0], -1) # X: (5000, 3, 224, 224)
features = vgg.classifier(features) # features: (5000, 25088)
return features # features: (5000, 4096)
- Try out
cosine similarity,inner product,torch.cdist, however, only to find several bad clusters.
Any suggestion? Thanks in advance.