how to get standardised (Beta) coefficients for multiple linear regression using statsmodels

Question

when using the .summary() function using pandas statsmodels, the OLS Regression Results include the following fields.

coef    std err          t      P>|t|      [0.025      0.975]

How can I get the standardised coefficients (which exclude the intercept), similarly to what is achievable in SPSS?

score 11 · Answer 1 · answered Feb 12 '19 at 14:14

You just need to standardize your original DataFrame using a z distribution (i.e., z-score) first and then perform a linear regression.

Assume you name your dataframe as df, which has independent variables x1, x2, and x3, and dependent variable y. Consider the following code:

import pandas as pd
import numpy as np
from scipy import stats
import statsmodels.formula.api as smf

# standardizing dataframe
df_z = df.select_dtypes(include=[np.number]).dropna().apply(stats.zscore)

# fitting regression
formula = 'y ~ x1 + x2 + x3'
result = smf.ols(formula, data=df_z).fit()

# checking results
result.summary()

Now, the coef will show you the standardized (beta) coefficients so that you can compare their influence on your dependent variable.

Notes:

Please keep in mind that you need .dropna(). Otherwise, stats.zscore will return all NaN for a column if it has any missing values.
Instead of using .select_dtypes(), you can select column manually but make sure all the columns you selected are numeric.
If you only care about the standardized (beta) coefficients, you can also use result.params to return it only. It will usually be displayed in a scientific-notation fashion. You can use something like round(result.params, 5) to round them.

Josef · Answer 2 · 2019-02-13T17:44:26.380

1

We can just transform the estimated params by the standard deviation of the exog. results.t_test(transformation) computes the parameter table for the linearly transformed variables.

AFAIR, the following should produce the beta coefficients and corresponding inferential statistics.

Compute standard deviation, but set it to 1 for the constant.

std = model.exog.std(0)
std[0] = 1

Then use results.t_test and look at the params_table. np.diag(std) creates a diagonal matrix that transforms the params.

tt = results.t_test(np.diag(std))
print(tt.summary()
tt.summary_frame()

edited Feb 13 '19 at 17:44

answered Jun 13 '18 at 19:16

Josef

21,998
3
54
67

what is the "model" here? – Frank Wang Dec 27 '18 at 03:48
1

`model` is any of the model instances, e.g. OLS or GLM. `results` is the corresponding Results instance returned by `model.fit()`. e.g. `model = OLS(y, x)` and `results = model.fit()` – Josef Dec 27 '18 at 21:22
di you standardize the response variable too> – Maths12 Feb 08 '21 at 15:55
my mistake< the usual definition for linear model standardizes also y https://github.com/statsmodels/statsmodels/issues/3857#issuecomment-463346200 – Josef Feb 08 '21 at 17:28

haider ali · Answer 3 · 2023-01-27T05:57:12.653

you can convert unstandardized coefficients by taking std deviation. Standardized Coefficient (Beta) is the requirement for the driver analysis. Below is the code that works for me. X is independent variables and y is dependent variable and coefficients are coef which are extracted by (model.params) from ols.

sd_x = X.std()
sd_y = Y.std()
beta_coefficients = []

# Iterate through independent variables and calculate beta coefficients
for i, col in enumerate(X.columns):
    beta = coefficients[i] * (sd_x[col] / sd_y)
    beta_coefficients.append([col, beta])

# Print beta coefficients
for var, beta in beta_coefficients:
    print(f' {var}: {beta}')

how to get standardised (Beta) coefficients for multiple linear regression using statsmodels

3 Answers3

Linked