You can use value_counts with numpy.where, where is condition with isin.
If your variable is of type object see below. If your variable is of type category, then skip down toward the bottom.
df = pd.DataFrame({'Color':'Red Red Blue Red Violet Blue'.split(),
'Value':[11,150,50,30,10,40]})
print (df)
Color Value
0 Red 11
1 Red 150
2 Blue 50
3 Red 30
4 Violet 10
5 Blue 40
a = df.Color.value_counts()
print (a)
Red 3
Blue 2
Violet 1
Name: Color, dtype: int64
#get top 2 values of index
vals = a[:2].index
print (vals)
Index(['Red', 'Blue'], dtype='object')
df['new'] = np.where(df.Color.isin(vals), 0,1)
print (df)
Color Value new
0 Red 11 0
1 Red 150 0
2 Blue 50 0
3 Red 30 0
4 Violet 10 1
5 Blue 40 0
Or if need replace all not top values use where:
df['new1'] = df.Color.where(df.Color.isin(vals), 'other')
print (df)
Color Value new1
0 Red 11 Red
1 Red 150 Red
2 Blue 50 Blue
3 Red 30 Red
4 Violet 10 other
5 Blue 40 Blue
For category type:
df = pd.DataFrame({'Color':'Red Red Blue Red Violet Blue'.split(),
'Value':[11,150,50,30,10,40]})
df.Color = df.Color.astype('category')
a= df.Color.value_counts()[:2].index
print(a)
CategoricalIndex(['Red', 'Blue'],
categories=['Blue', 'Red', 'Violet'],
ordered=False, dtype='category')
Notice that violet is still a category. So we need .remove_unused_categories().
vals = df.Color.value_counts()[:2].index.remove_unused_categories()
CategoricalIndex(['Red', 'Blue'],
categories=['Blue', 'Red'],
ordered=False, dtype='category')
As mentioned in the comments, a ValueError will occur when setting the new variable. The way around that is type changing.
df['new1'] = df.Color.astype('object').where(df.Color.isin(vals), 'other')
df['new1'] = df['new1'].astype('category')