I have a line of code:
g = x.groupby('Color')
The colors are Red, Blue, Green, Yellow, Purple, Orange, and Black. How do I return this list? For similar attributes, I use x.Attribute and it works fine, but x.Color doesn't behave the same way.
I have a line of code:
g = x.groupby('Color')
The colors are Red, Blue, Green, Yellow, Purple, Orange, and Black. How do I return this list? For similar attributes, I use x.Attribute and it works fine, but x.Color doesn't behave the same way.
 
    
    There is much easier way of doing it:
g = x.groupby('Color')
g.groups.keys()
By doing groupby() pandas returns you a dict of grouped DFs.
You can easily get the key list of this dict by python built in function keys().
If you do not care about the order of the groups, Yanqi Ma's answer will work fine:
g = x.groupby('Color')
g.groups.keys()
list(g.groups) # or this
However, note that g.groups is a dictionary, so in Python <3.7 the keys are inherently unordered! This is the case even if you use sort=True on the groupby method to sort the groups, which is true by default.
This actually bit me hard when it resulted in a different order on two platforms, especially since I was using list(g.groups), so it wasn't obvious at first that g.groups was a dict.
In my opinion, the best way to do this is to take advantage of the fact that the GroupBy object has an iterator, and use a list comprehension to return the groups in the order they exist in the GroupBy object:
g = x.groupby('Color')
groups = [name for name,unused_df in g]
It's a little less readable, but this will always return the groups in the correct order.
 
    
    Here's how to do it.
groups = list()
for g, data in x.groupby('Color'):
    print(g, data)
    groups.append(g)
The core idea here is this: if you iterate over a dataframe groupby iterator, you'll get back a two-tuple of (group name, filtered data frame), where filtered data frame contains only records corresponding to that group).
 
    
    It is my understanding that you have a Data Frame which contains multiples columns. One of the columns is "Color" which has different types of colors. You want to return a list of unique colors that exist.
colorGroups = df.groupby(['Color'])
for c in colorGroups.groups: 
    print c
The above code will give you all the colors that exist without repeating the colors names. Thus, you should get an output such as:
Red
Blue
Green
Yellow
Purple
Orange
Black
An alternative is the unique() function which returns an array of all unique values in a Series. Thus to get an array of all unique colors, you would do:
df['Color'].unique()
The output is an array, so for example print df['Color'].unique()[3] would give you Yellow. 
 
    
    I compared runtime for the solutions above (with my data):
In [443]: d = df3.groupby("IND")
In [444]: %timeit groups = [name for name,unused_df in d]
377 ms ± 27.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [445]: % timeit  list(d.groups)
1.08 µs ± 47.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [446]: % timeit d.groups.keys()
708 ns ± 7.18 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [447]: % timeit df3['IND'].unique()
5.33 ms ± 128 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
it seems that the 'd.groups.keys()' is the best method.
 
    
    Hope this helps.. Happy Coding :)
df = pd.DataFrame(data=[['red','1','1.5'],['blue','20','2.5'],['red','15','4']],columns=(['color','column1','column2']))
list_req = list(df.groupby('color').groups.keys())
print(list_req)
