What is the difference between xgb.train and xgb.XGBRegressor (or xgb.XGBClassifier)?

Question

I already know "xgboost.XGBRegressor is a Scikit-Learn Wrapper interface for XGBoost."

But do they have any other difference?

Differences in parameter names, differences in input, difference in evaluation strategies. Please look at the [python api](http://xgboost.readthedocs.io/en/latest/python/python_api.html) and [github page](https://github.com/dmlc/xgboost/tree/master/python-package). This is not a good question. If you have any difficulties in understanding the output from the two approaches, please ask a detailed question with code. — Vivek Kumar, Nov 07 '17 at 09:16

score 54 · Accepted Answer · answered Nov 11 '17 at 22:21

xgboost.train is the low-level API to train the model via gradient boosting method.

xgboost.XGBRegressor and xgboost.XGBClassifier are the wrappers (Scikit-Learn-like wrappers, as they call it) that prepare the DMatrix and pass in the corresponding objective function and parameters. In the end, the fit call simply boils down to:

self._Booster = train(params, dmatrix,
                      self.n_estimators, evals=evals,
                      early_stopping_rounds=early_stopping_rounds,
                      evals_result=evals_result, obj=obj, feval=feval,
                      verbose_eval=verbose)

This means that everything that can be done with XGBRegressor and XGBClassifier is doable via underlying xgboost.train function. The other way around it's obviously not true, for instance, some useful parameters of xgboost.train are not supported in XGBModel API. The list of notable differences includes:

xgboost.train allows to set the callbacks applied at end of each iteration.
xgboost.train allows training continuation via xgb_model parameter.
xgboost.train allows not only minization of the eval function, but maximization as well.

score 14 · Answer 2 · answered Feb 25 '20 at 14:13

@Maxim, as of xgboost 0.90 (or much before), these differences don't exist anymore in that xgboost.XGBClassifier.fit:

has callbacks
allows contiunation with the xgb_model parameter
and supports the same builtin eval metrics or custom eval functions

What I find is different is evals_result, in that it has to be retrieved separately after fit (clf.evals_result()) and the resulting dict is different because it can't take advantage of the name of the evals in the watchlist ( watchlist = [(d_train, 'train'), (d_valid, 'valid')] ) .

score 5 · Answer 3 · answered Nov 08 '20 at 19:48

From my opinion the main difference is the training/prediction speed.

For further reference I will call the xgboost.train - 'native_implementation' and XGBClassifier.fit - 'sklearn_wrapper'

I have made some benchmarks on a dataset shape (240000, 348)

Fit/train time: sklearn_wrapper time = 89 seconds native_implementation time = 7 seconds

Prediction time: sklearn_wrapper = 6 seconds native_implementation = 3.5 milliseconds

I believe this is reasoned by the fact that sklearn_wrapper is designed to use the pandas/numpy objects as input where the native_implementation needs the input data to be converted into a xgboost.DMatrix object.

In addition one can optimise n_estimators using a native_implementation.

"native implementation" needs DMatrix as input. Have you considered the time needed to convert your data into the DMatrix when calculating the time of "native implementation"? — Mohammad, Sep 08 '21 at 16:23

Josef Švenda · Answer 4 · 2022-09-17T12:51:53.943

@Danil is suggesting significant differences in speed and @Mohammad correctly points out the necessity to convert data to DMatrix structure. So I have tried to replicate the benchmark in the Kaggle notebook environment.

The results showed no major training/predicting speed difference among xgboost native and sklearn_wrapper.

EDIT: adding code of benchmark as suggested in comments


import numpy as np
import xgboost as xgb
xgb.__version__

'1.6.1'

    # training data
    X = np.random.rand(240000, 348)
    y = np.random.rand(240000)

Benchmark xgboost native implementation

```python
%%time
# convert training data
dtrain = xgb.DMatrix(X, label=y)

CPU times: user 3.61 s, sys: 505 ms, total: 4.12 s\
Wall time: 1.56 s

```python
    %%time
    # train the model with default parameters
    model = xgb.train({'objective':'reg:squarederror'},dtrain,10)

CPU times: user 6min 8s, sys: 700 ms, total: 6min 9s
Wall time: 1min 34s

    %%time
    # predict with trained model
    prediction = model.predict(dtrain)

CPU times: user 818 ms, sys: 1.01 ms, total: 819 ms
Wall time: 209 ms

Benchmark Scikit-Learn Wrapper interface for XGBoost

    %%time
    model = xgb.XGBRegressor(n_estimators=10)
    model.fit(X,y)

CPU times: user 6min 15s, sys: 1.2 s, total: 6min 16s
Wall time: 1min 37s

XGBRegressor(base_score=0.5, booster='gbtree', callbacks=None, colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, early_stopping_rounds=None, enable_categorical=False, eval_metric=None, gamma=0, gpu_id=-1, grow_policy='depthwise', importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_bin=256, max_cat_to_onehot=4, max_delta_step=0, max_depth=6, max_leaves=0, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=10, n_jobs=0, num_parallel_tree=1, predictor='auto', random_state=0, reg_alpha=0, reg_lambda=1, ...)

    %%time
    prediction_1 = model.predict(X)

CPU times: user 1.48 s, sys: 1.99 ms, total: 1.48 s
Wall time: 380 ms

While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/late-answers/32634252) — Deenadhayalan Manoharan, Sep 08 '22 at 09:29

What is the difference between xgb.train and xgb.XGBRegressor (or xgb.XGBClassifier)?

4 Answers4

EDIT: adding code of benchmark as suggested in comments

Benchmark xgboost native implementation

Benchmark Scikit-Learn Wrapper interface for XGBoost

Linked