3

I am new to Pandas and its libraries. By using the following code I can make a scatter plot of my 'class' in the plane 'Month' vs 'Amount'. Because I consider more than one class I would like to use colors for distinguishing each class and to see a legend in the figure.

Below my first attempt can generate dots for each given class having a different color but it can not generate the right legend. On the contrary the second attempt can generate the right legend but labeling is not correct. I can indeed visualize the first letter of each class name. Moreover this second attempt plots as many figures as the number of classes. I would like to see how I can correct both my attempts. Any ideas? suggestions? Thanks in advance.

ps. I wanted to use

colors = itertools.cycle(['gold','blue','red','chocolate','mediumpurple','dodgerblue']) 

as well, so that I could decide the colors. I could not make it though.

Attempts:

import pandas as pd
import numpy as np
import random 
from matplotlib import pyplot as plt 
import matplotlib.cm as cm

np.random.seed(176)
random.seed(16)

df = pd.DataFrame({'class': random.sample(['living room','dining room','kitchen','car','bathroom','office']*10, k=25),
                   'Amount': np.random.sample(25)*100,
                   'Year': random.sample(list(range(2010,2018))*50, k=25),
                   'Month': random.sample(list(range(1,12))*100, k=25)})

print(df.head(25))

print(df['class'].unique())

for cls1 in df['class'].unique():
    test1= pd.pivot_table(df[df['class']==cls1], index=['class', 'Month', 'Year'], values=['Amount'])
    print(test1)

colors = cm.rainbow(np.linspace(0,2,len(df['class'].unique()))) 

fig, ax = plt.subplots(figsize=(8,6))

for cls1,c in zip(df['class'].unique(),colors): 
    # SCATTER PLOT
    test = pd.pivot_table(df[df['class']==cls1], index=['class', 'Month', 'Year'], values=['Amount'], aggfunc=np.sum).reset_index()    
    test.plot(kind='scatter', x='Month',y='Amount', figsize=(16,6),stacked=False,ax=ax,color=c,s=50).legend(df['class'].unique(),scatterpoints=1,loc='upper left',ncol=3,fontsize=10.5)
plt.show() 


for cls2,c in zip(df['class'].unique(),colors): 
    # SCATTER PLOT
    test = pd.pivot_table(df[df['class']==cls2], index=['class', 'Month', 'Year'], values=['Amount'], aggfunc=np.sum).reset_index()    
    test.plot(kind='scatter', x='Month',y='Amount', figsize=(16,6),stacked=False,color=c,s=50).legend(cls2,scatterpoints=1,loc='upper left',ncol=3,fontsize=10.5)
    plt.show() 

enter image description here

Up-to-date code

I would like to plot the following code via scatter plot.

for cls1 in df['class'].unique():
    test3= pd.pivot_table(df[df['class']==cls1], index=['class', 'Month'], values=['Amount'], aggfunc=np.sum)
    print(test3)

Unlike above here a class appears only once each month thanks to the sum over Amount.

Here my attempt:

for cls2 in df['class'].unique():
    test2= pd.pivot_table(df[df['class']==cls2], index=['class','Year'], values=['Amount'], aggfunc=np.sum).reset_index()
    print(test2)
    sns.lmplot(x='Year' , y='Amount', data=test2, hue='class',palette='hls', fit_reg=False,size= 5, aspect=5/3, legend_out=False,scatter_kws={"s": 70})
plt.show() 

This gives me one plot for each class. A part from the first one (class=car) which shows different colors, the others seem to be ok. Despite this, I would like to have only one plot with all classes..

After the Marvin Taschenberger's useful help here is up-to-date result:

enter image description here

I get a white dot instead a colorful one and the legend has a different place in the figure with respect to your figure. Moreover I can not see the year labels correctly. Why?

fdrigo
  • 191
  • 1
  • 4
  • 14
  • sorry, but could you explain what exactly is wrong with them, like explicitly what you dislike and how it should look like? I dont really get what is wrong with the first one. Nevertheless, you might want to look at `seaborn` since it is a library directly for plotting and supports `DataFrames` – Marvin Taschenberger Jul 14 '17 at 17:34
  • In the first plot the the associations colors-dots are wrong while in the second one the label is just given by the first letter of the name of the corresponding class and not by the full name. These two features are what I dislike. Thanks – fdrigo Jul 14 '17 at 19:43

1 Answers1

2

An easy way to work around ( unfortunately not solving) your problem is letting seaborn deal with the heavy lifting due to the simple line

sns.lmplot(x='Month' , y='Amount', data=df, hue='class',palette='hls', fit_reg=False,size= 8, aspect=5/3, legend_out=False)

You could also plug in other colors for palette

EDIT : how about this then : `

import pandas as pd
import numpy as np
import random 
from matplotlib import pyplot as plt 
import seaborn as sns 

np.random.seed(176)
random.seed(16)

df = pd.DataFrame({'class': random.sample(['living room','dining room','kitchen','car','bathroom','office']*10, k=25),
               'Amount': np.random.sample(25)*100,
               'Year': random.sample(list(range(2010,2018))*50, k=25),
               'Month': random.sample(list(range(1,12))*100, k=25)})

frame = pd.pivot_table(df, index=['class','Year'], values=['Amount'], aggfunc=np.sum).reset_index()
sns.lmplot(x='Year' , y='Amount', data=frame, hue='class',palette='hls', fit_reg=False,size= 5, aspect=5/3, legend_out=False,scatter_kws={"s": 70})
plt.show()

enter image description here

  • Thanks for your answer. However, I can use it only if I do not manipulate my data frame. In the first place I wanted to plot the result of a pivot_table. For what I wrote using a pivot_table was not so relevant. Going back to my first goal I tried to use sns.lmplot without success with the last lines I added to my code above. Any ideas? – fdrigo Jul 15 '17 at 11:03
  • Sorry but I still a little confused what you need/want ( blame my English skills), but is this maybe acceptable [see EDIT above] – Marvin Taschenberger Jul 15 '17 at 12:40
  • Thanks. I meant exactly this. I posted what I obtain by using your code. I have got a white dot instead a colorful one and the legend is not in the same place like in your figure (see above please). Moreover I can not see the correct year labels on the x-axis. You know why? Danke für deine Hilfe :) – fdrigo Jul 15 '17 at 14:09
  • I can't really say why it is different but I posted my complete code. If you check it, let me know if the result is still different ( I tested in a jupyter notebook and as a script) – Marvin Taschenberger Jul 15 '17 at 16:01
  • Hi, I just copied your complete code and made it run. It is not working yet. Very strange...I will try to do some research to solve the problem. I will let you know when I understand what happens. Thanks – fdrigo Jul 16 '17 at 10:00
  • I solved the issues with the year frame ticks and the position of the legend. On the contrary I could not get any solutions for the missing white dots. I opened another question. Hopefully somebody will answer. Thanks again. Here the link to the question: https://stackoverflow.com/questions/45127983/seaborn-scatter-plot-with-missing-points-in-the-figure – fdrigo Jul 16 '17 at 11:26