You just need to standardize your original DataFrame using a z distribution (i.e., z-score) first and then perform a linear regression.
Assume you name your dataframe as df, which has independent variables x1, x2, and x3, and dependent variable y. Consider the following code:
import pandas as pd
import numpy as np
from scipy import stats
import statsmodels.formula.api as smf
# standardizing dataframe
df_z = df.select_dtypes(include=[np.number]).dropna().apply(stats.zscore)
# fitting regression
formula = 'y ~ x1 + x2 + x3'
result = smf.ols(formula, data=df_z).fit()
# checking results
result.summary()
Now, the coef will show you the standardized (beta) coefficients so that you can compare their influence on your dependent variable.
Notes:
- Please keep in mind that you need
.dropna(). Otherwise, stats.zscore will return all NaN for a column if it has any missing values.
- Instead of using
.select_dtypes(), you can select column manually but make sure all the columns you selected are numeric.
- If you only care about the standardized (beta) coefficients, you can also use
result.params to return it only. It will usually be displayed in a scientific-notation fashion. You can use something like round(result.params, 5) to round them.