Creating subsets on multiple features in python for segmentation

Question

I want to segment a dataset containing items (labeled with IDs), and multiple categorical features that take different values (for instance, color takes 'blue', 'orange', 'green'; size takes 'S', 'M', 'L', brand takes 'Brand A', 'Brand B', etc.):

ID	Brand	Color	Size	Price
1	Brand 1	Orange	S	23
2	Brand 2	Blue	XXL	3
3	Brand 1	Green	XXXL	45
4	Brand 2	Blue	M	200

I can easily do it by hand for 1 or 2 features (with a small number of values). E.G. if I segment by brand I get:

ID	Brand	Color	Size	Price
1	Brand 1	Orange	S	23
3	Brand 1	Green	XXXL	45

and

ID	Brand	Color	Size	Price
2	Brand 2	Blue	XXL	3
4	Brand 2	Blue	M	200

Unfortunately, some features take 10+ values. Moreover, the number of subsets explodes if I want to segment according to more than 1 feature for segmentation. I am trying to test different levels of segmentation (e.g. color + brand, color+brand+size) which is why I don't do it by hand.

I am trying to figure out a function that take the dataframe and a list of features in input and that output all the different subsets but for now, my code is worthless.

Thank you in advance if you think you can help me!

Not enough reputation to comment, but [see this thread](https://stackoverflow.com/questions/14734533/how-to-access-pandas-groupby-dataframe-by-key) — Cory Nezin, Feb 15 '22 at 15:18

Creating subsets on multiple features in python for segmentation

0 Answers0