from sklearn.preprocessing import LabelBinarizer
vs
from sklearn.preprocessing import LabelEncoder
What is difference between LabelEncoder and LabelBinarizer and which one to use when?
Thanks in advance.
from sklearn.preprocessing import LabelBinarizer
vs
from sklearn.preprocessing import LabelEncoder
What is difference between LabelEncoder and LabelBinarizer and which one to use when?
Thanks in advance.
labelEncoder does not create dummy variable for each category in your X whereas LabelBinarizer does that. Here is an example from documentation.
from sklearn.preprocessing import LabelBinarizer,LabelEncoder
data1 = [1, 2, 2, 6]
lb = LabelBinarizer()
le = LabelEncoder()
print('LabelBinarizer output \n',lb.fit_transform(data1))
#LabelBinarizer output
[[1 0 0]
[0 1 0]
[0 1 0]
[0 0 1]]
print('LabelEncoder output \n',le.fit_transform(data1))
#LabelEncoder output
[0 1 1 2]
Hence if you want to just encode the categories into 0, 1, 2, 3, etc. use labelEncoder. If you want to create dummy variable for each category, then go for labeBinarizer.