For reference:
- Python 3.8.3
- sklearn 1.0.2
I have a scikit-learn pipeline that formats some data for me, described below:
I define my pipeline like so:
# Pipeline 1
cat_selector = make_column_selector(dtype_include=object)
num_selector = make_column_selector(dtype_include=np.number)
cat_linear_processor = OneHotEncoder(handle_unknown="ignore", drop='first', sparse=False)
num_linear_processor = make_pipeline(SimpleImputer(strategy="median", add_indicator=True), MinMaxScaler(feature_range=(-1,1)))
linear_preprocessor = make_column_transformer( (num_linear_processor, num_selector), (cat_linear_processor, cat_selector) )
model_params ={'alpha': 0.0013879181970625643,
 'l1_ratio': 0.9634269882730605,
 'fit_intercept': True,
 'normalize': False,
 'max_iter': 245.69684524349375,
 'tol': 0.01855761485447601,
 'positive': False,
 'selection': 'random'}
model = ElasticNet(**model_params)
pipeline = make_pipeline(linear_preprocessor, model)
pipeline.steps yields:
[('columntransformer',
  ColumnTransformer(transformers=[('pipeline',
                                   Pipeline(steps=[('simpleimputer',
                                                    SimpleImputer(add_indicator=True,
                                                                  strategy='median')),
                                                   ('minmaxscaler',
                                                    MinMaxScaler(feature_range=(-1,
                                                                                1)))]),
                                   <sklearn.compose._column_transformer.make_column_selector object at 0x0000029CA3231EE0>),
                                  ('onehotencoder',
                                   OneHotEncoder(handle_unknown='ignore',
                                                 sparse=False),
                                   <sklearn.compose._column_transformer.make_column_selector object at 0x0000029CA542F040>)])),
 ('elasticnet',
  ElasticNet(alpha=0.0013879181970625643, l1_ratio=0.9634269882730605,
             max_iter=245.69684524349375, normalize=False, selection='random',
             tol=0.01855761485447601))]
What I am trying to do is retrieve the feature names for the data that is trained/tested on.
I have tried referencing numerous other questions:
- Sklearn Pipeline: Get feature names after OneHotEncode In ColumnTransformer
- Can You Consistently Keep Track of Column Labels Using Sklearn's Transformer API?
- Use ColumnTransformer.get_feature_names to create a reverse feature mapping
However, these solutions have not worked. For example:
[i for i in v.get_feature_names() for k, v in pipeline.named_steps.items() if hasattr(v,'get_feature_names')]
Yields:
----> 1 [i for i in v.get_feature_names() for k, v in pipeline.named_steps.items() if hasattr(v,'get_feature_names')]
NameError: name 'v' is not defined
I tried:
pipeline[:-1].get_feature_names_out()
Yields:
AttributeError: Estimator simpleimputer does not provide get_feature_names_out. Did you mean to call pipeline[:-1].get_feature_names_out()?
How can I retrieve feature names after encoding from my current pipeline?
 
    
