I would like to evaluate the performance of a model pipeline. I am not training my model on the ground-truth labels that I am evaluating the pipeline against, therefore doing a cross-validation scheme is unnecessary. However, I would still like to use the grid search functionality provided in sklearn.
Is it possible to use sklearn.model_selection.GridSearchCV without splitting the data?  In other words, I would like to run Grid Search and get scores on the full dataset that I pass in to the pipeline.
Here is a simple example:
I might wish to choose the optimal k for KMeans.  I am actually going to be using KMeans on many datasets that are similar in some sense.  It so happens that I have some ground-truth labels for a few such datasets, which I will call my "training" data.  So, instead of using something like BIC, I decide to simply pick the optimal k for my training data, and employ that k for future datasets.  Search over possible values of k is a grid search.  KMeans is available in the sklearn library, so I can very easily define a grid search on this model.  Incidentally, KMeans takes in an "empty" y value, which simply passes through and can be used in a GridSearchCV scorer.  However, there is no sense in doing cross-validation here, since my individual kmeans models never see the ground truth labels and are therefore incapable of overfitting.
To be clear, the above example is simply a contrived example to justify a possible use case for such a thing for those who are afraid that I might abuse this functionality.  The solution to the example above that I am interested in is how to not split the data in GridSearchCV.
Is it possible to use sklearn.model_selection.GridSearchCV without splitting the data?
 
    