How to avoid code duplication in init during inheritance? Extending xgb.XGBClassifier to handle feature names

Question

I have concrete problem with extending xgb.XGBClassifier class, but it could be framed as general OOP question.

My implementation is based on: https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/sklearn.py

Basically, I want add feature names handling when provided data is in pandas DataFrame.

A few remarks:

XGBClassifierN has the same parameters in __init__ as base class xgb.XGBClassifier,
there is an additional attribute self.feature_namesthat is set by later fit method.
Rest could be done by mix-ins.

It works.

What bothers me, is this wall of code in __init__. It was done by copy-paste defaults and every time when xgb.Classifier will change it had to be updated.

Is there any way to concise express idea that child class XGBClassifierN has the same parameters and defaults as parent class xgb.XGBClassifier and do later things like clf = XGBClassifierN(n_jobs=-1)?

I've tried to use only **kwargs but it doesn't work out (interpreter starts to complain that there is no missing parameter (no pun intentented), and to make it work basically you need to set a few more parameters).

import xgboost as xgb

class XGBClassifierN(xgb.XGBClassifier):
    def __init__(self, base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0,
              learning_rate=0.1, max_delta_step=0, max_depth=3,
              min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,
              nthread=None, objective='binary:logistic', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1, **kwargs): 
        super().__init__(base_score=base_score, booster=booster, colsample_bylevel=colsample_bylevel,
              colsample_bynode=colsample_bynode, colsample_bytree=colsample_bytree, gamma=gamma,
              learning_rate=learning_rate, max_delta_step=max_delta_step, max_depth=max_depth,
              min_child_weight=min_child_weight, missing=missing, n_estimators=n_estimators, n_jobs=n_jobs,
              nthread=nthread, objective=objective, random_state=random_state,
              reg_alpha=reg_alpha, reg_lambda=reg_lambda, scale_pos_weight=scale_pos_weight, seed=seed,
              silent=silent, subsample=subsample, verbosity=verbosity, **kwargs)
        self.feature_names = None

    def fit(self, X, y=None):
        self.feature_names = list(X.columns)
        return super().fit(X, y)

    def get_feature_names(self):
        if not isinstance(self.feature_names, list):
            raise ValueError('Must fit data first!')
        else:
            return self.feature_names

    def get_feature_importances(self):
        return dict(zip(self.get_feature_names(), self.feature_importances_))

bruno desthuilliers · Accepted Answer · 2019-12-04T14:57:44.500

1

I've tried to use only **kwargs but it doesn't work out.

"doesn't work" is most useless possible description of a problem. You should instead document exactly what happens with all relevant details.

This being said, if your question is "how do I avoid retyping the full parent method signature etc", you were actually quite close with **kwargs, just missing the positional args part *args:

def __init__(self, *args, **kwargs): 
    super().__init__(*args, **kwargs)
    self.feature_names = None

Just note that this makes the signature useless for inspection (pydoc, the builtin help() function, IDE autocompletion etc).

EDIT: actually, there might be a way - at least, according to this answer, functools.wraps() CAN preserve the signature of a decorated function. There's also Michele Simionato's "decorator" lib that provides the same service, so you should be able to reuse this code to have your cake and eat it too !-)

edited Dec 04 '19 at 14:57

answered Dec 04 '19 at 12:42

bruno desthuilliers

75,974
6
88
118

Thanks. I've updated the question. Basically interpreter complains that required parameters are not set, but interesting it works for LogisticRegression from scikit-learn, with minor change: no *args. Btw. I'd want to have a signature. – Quant Christo Dec 04 '19 at 13:05
@QuantChristo I'm afraid your added details are still not enough - we'd need a *proper* [mcve] here to diagnose the issue. But anyway: if you want to be able to use any tool relying on inspection, you have no other choice (that I know of at least) than maintaining a complete copy of the parent's method signature. – bruno desthuilliers Dec 04 '19 at 13:16
Many thanks Bruno for help. I need inspection. I'll to dig deeper: I'll check `xgb.XGBClassifier().get_params() ` in init and meta-programming. If I find solution, I'll post it. – Quant Christo Dec 04 '19 at 13:32
@QuantChristo looks like my python-foo are not what they use to be - cf my edited answer. – bruno desthuilliers Dec 04 '19 at 14:51

How to avoid code duplication in init during inheritance? Extending xgb.XGBClassifier to handle feature names

1 Answers1