Questions tagged [scikit-learn-pipeline]
92 questions
                    
                    43
                    
            votes
                
                4 answers
            
        Invalid parameter for sklearn estimator pipeline
I am implementing an example from the O'Reilly book "Introduction to Machine Learning with Python", using Python 2.7 and sklearn 0.16.
The code I am using:
pipe = make_pipeline(TfidfVectorizer(), LogisticRegression())
param_grid =…
         
    
    
        sudo_coffee
        
- 888
- 1
- 12
- 26
                    41
                    
            votes
                
                4 answers
            
        return coefficients from Pipeline object in sklearn
I've fit a Pipeline object with RandomizedSearchCV
pipe_sgd = Pipeline([('scl', StandardScaler()),
                    ('clf', SGDClassifier(n_jobs=-1))])
param_dist_sgd = {'clf__loss': ['log'],
                 'clf__penalty': [None, 'l1', 'l2',…
         
    
    
        spies006
        
- 2,867
- 2
- 19
- 28
                    25
                    
            votes
                
                2 answers
            
        Is it possible to toggle a certain step in sklearn pipeline?
I wonder if we can set up an "optional" step in sklearn.pipeline. For example, for a classification problem, I may want to try an ExtraTreesClassifier with AND without a PCA transformation ahead of it. In practice, it might be a pipeline with an…
         
    
    
        dolaameng
        
- 1,397
- 2
- 17
- 24
                    10
                    
            votes
                
                3 answers
            
        How to gridsearch over transform arguments within a pipeline in scikit-learn
My goal is to use one model to select the most important variables and another model to use those variables to make predictions. In the example below I am using two instances of RandomForestClassifier, but the second model could be any other…
         
    
    
        Jason Sanchez
        
- 477
- 2
- 6
- 19
                    5
                    
            votes
                
                2 answers
            
        How to create pandas output for custom transformers?
There are a lot of changes in scikit-learn 1.2.0 where it supports pandas output for all of the transformers but how can I use it in a custom transformer?
In [1]: Here is my custom transformer which is a standard scaler: 
from sklearn.base import…
         
    
    
        Armando Bridena
        
- 237
- 3
- 10
                    5
                    
            votes
                
                2 answers
            
        Sklearn Pipeline: How to build for kmeans, clustering text?
I have text as shown :
 list1 = ["My name is xyz", "My name is pqr", "I work in abc"]
The above will be training set for clustering text using kmeans.
list2 = ["My name is xyz", "I work in abc"]
The above is my test set.
I have built a vectorizer…
         
    
    
        user1452759
        
- 8,810
- 15
- 42
- 58
                    4
                    
            votes
                
                1 answer
            
        How can I check the changes made by Scikit-Learn Pipeline?
This is a very straightforward question, but I couldn't find the answer anywhere. I tried Google, TDS, Analytics Vidhya, StackOverflow, etc... so, here's the thing, I'm using Scikit-Learn Pipelines, but I wanted to see how my data was treated by the…
         
    
    
        Yuxxxxxx
        
- 203
- 1
- 5
                    3
                    
            votes
                
                0 answers
            
        How to use different feature set on for each estimator in a Multi estimator sklearn pipeline
Below is an example sklearn pipeline. There are two sklearn StackingClassifiers:
stackingclassifier1 with base classifier as RandomForestClassifier & stackingclassifier2 as Meta Learner.
stackingclassifier2 with base classifier as…
         
    
    
        Jyoti Hassanandani
        
- 91
- 5
                    3
                    
            votes
                
                1 answer
            
        SimpleImputer object has no attribute _fit_dtype
I have a trained scikit-learn model pipeline (including a SimpleImputer) that I'm trying to put into production. However, I get the following error when running it in the production environment.
SimpleImputer object has no attribute _fit_dtype
How…
         
    
    
        Jakob
        
- 663
- 7
- 25
                    3
                    
            votes
                
                1 answer
            
        How can I get features names when there is a preprocessor before feature selection?
I tried checking some posts like this, this and this but I still couldn't find what I need.
These are the transformations I'm doing:
cat_transformer = Pipeline(steps=[("encoder", TargetEncoder())])
num_transformer = Pipeline(
    steps=[
       …
         
    
    
        dsbr__0
        
- 241
- 1
- 3
                    3
                    
            votes
                
                0 answers
            
        How to fit Sklearn Pipeline on Catboost Classifier with Embedding features
I have a Catboost Classifier that predicts on some embedding features, and AFAIK these embedding features can only be specified through Pools (meaning I have to create a pool and then pass the pool for the Catboost classifier's .fit method in order…
         
    
    
        Edouard Malet
        
- 51
- 1
                    3
                    
            votes
                
                1 answer
            
        How to train an sklearn pipeline in AWS?
Working within a Sagemaker Jupyter Notebook I have an XGBoost pipeline which transforms my data and also runs some feature selection:
steps_xgb = [('scaler', MinMaxScaler()),
         ('feature_reduction', SelectKBest(mutual_info_classif)),
        …
         
    
    
        quantumofnolace
        
- 125
- 7
                    2
                    
            votes
                
                2 answers
            
        How can I use sklearn's make_column_selector to select all valid datetime columns?
I want to select columns based on their datetime data types. My DataFrame has for example columns with types np.dtype('datetime64[ns]'), np.datetime64 and 'datetime64[ns, UTC]'.
Is there a generic way to select all columns with a datetime…
         
    
    
        JAdel
        
- 1,309
- 1
- 7
- 24
                    2
                    
            votes
                
                2 answers
            
        Error finding attribute `feature_names_in_` that exists in docs
I'm getting the error AttributeError: 'LogisticRegression' object has no attribute 'feature_names_in_' even though that attribute is written in the docs.
I'm on scikit-learn version 1.0.2.
I created an object LogisticRegression and I am trying to…
         
    
    
        sanderlin2013
        
- 31
- 6
                    2
                    
            votes
                
                1 answer
            
        How to preserve column names in scikit-learn ColumnTransformer?
I', creating some pipelines using scikit-learn but I'm having some trouble keeping the variables names as the original names, and not as the transformer_name__feature_name format
This is the scenario:
I have a set of transformers, both custom and…
         
    
    
        Rodrigo A
        
- 657
- 7
- 23