I have a Catboost Classifier that predicts on some embedding features, and AFAIK these embedding features can only be specified through Pools (meaning I have to create a pool and then pass the pool for the Catboost classifier's .fit method in order for the model to pick them up).
These embedding features are generated by a TfidfVectorizer, so I would like to wrap the TfidfVectorizer and the classifier as part of an sklearn Pipeline to tidy up my code and have a clear pipeline to train/predict.
Unfortunately, I cannot pass Catboost Pool to an sklearn Pipeline because when I do, I get the following error:
Expected 2D array, got scalar array instead:
array=<catboost.core.Pool object at 0x7f98f0256820>.
Is there any way around this?
 
    