In my regular data analysis work, I have switched to use 100% python since the seaborn package becomes available. Big thanks to this wonderful package. However, One excel-chart feature I miss is to display the polyfit equation and/or R2 value when use the lmplot() function. Does anyone know an easy way to add that?
            Asked
            
        
        
            Active
            
        
            Viewed 6.3k times
        
    29
            
            
        - 
                    possible duplicate of [How do I calculate r-squared using Python and Numpy?](http://stackoverflow.com/questions/893657/how-do-i-calculate-r-squared-using-python-and-numpy) – MattDMo Aug 30 '14 at 05:36
- 
                    7It's not really a duplicate because the question is whether this can be added automatically by the seaborn functions, not how to calculate it manually. – mwaskom Aug 30 '14 at 15:16
2 Answers
34
            
            
        This now can be done using FacetGrid methods .map() or .map_dataframe():
import seaborn as sns
import scipy as sp
tips = sns.load_dataset('tips')
g = sns.lmplot(x='total_bill', y='tip', data=tips, row='sex',
               col='time', height=3, aspect=1)
def annotate(data, **kws):
    r, p = sp.stats.pearsonr(data['total_bill'], data['tip'])
    ax = plt.gca()
    ax.text(.05, .8, 'r={:.2f}, p={:.2g}'.format(r, p),
            transform=ax.transAxes)
    
g.map_dataframe(annotate)
plt.show()
- 
                    Thanks Marcos, if in your annotate(), x, y are changed, how to do it? I am trying to do like this: def annotate(data,x,y), r, p = sp.stats.pearsonr(data[x], data[y]), then g.map_dataframe(annotate(data,x,y), then I got an error of AttributeError: 'NoneType' object has no attribute '__module__'. Thanks for your help – roudan Jul 21 '21 at 20:09
- 
                    1I am not sure if I understand your question. x and y are changed in the four subplots in the example I gave. Maybe you could provide an actual example with code of what you need. In your case, x and y must be columns of the dataframe data, then you should use data['x'], data['y'], with quotes, and not data[x], data[y]. – Marcos Jul 23 '21 at 02:18
- 
                    Thanks Marcos, here is what I did: def annotate(data, x,y,**kws): r, p = sp.stats.pearsonr(data['x'], data[y']) ax = plt.gca() ax.text(.05, .8, 'r={:.2f}, p={:.2g}'.format(r, p), transform=ax.transAxes) g.map_dataframe(annotate(data,x,y) plt.show(), then I got an error for using g.map_dataframe(annotate(data,x,y). How to correct this final line? Thanks – roudan Jul 23 '21 at 03:54
28
            
            
        It can't be done automatically with lmplot because it's undefined what that value should correspond to when there are multiple regression fits (i.e. using a hue, row or col variable.
But this is part of the similar jointplot function. By default it shows the correlation coefficient and p value:
import seaborn as sns
import numpy as np
x, y = np.random.randn(2, 40)
sns.jointplot(x, y, kind="reg")
But you can pass any function. If you want R^2, you could do:
from scipy import stats
def r2(x, y):
    return stats.pearsonr(x, y)[0] ** 2
sns.jointplot(x, y, kind="reg", stat_func=r2)

 
    
    
        mwaskom
        
- 46,693
- 16
- 125
- 127
- 
                    Thanks, I think I can use the jointplot() one by one instead of the nice multiple chart feature of lmplot(). However, can the top/side histograms be optional so that I can pack many into a lmplot() equivalent. – user3287545 Aug 30 '14 at 16:09
- 
                    What is p value (0,22) here? I guess pearson correlation is pearsonr value. – cacert Jan 06 '16 at 20:41
- 
                    @cacert: see https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pearsonr.html - probability of seeing such a correlation with two completely independent variables. – naught101 Sep 01 '16 at 03:30
- 
                    25This is no longer supported in Seaborn `0.11`, although it used to work in Seaborn `0.9`. – Seanny123 May 12 '21 at 15:37
 
     
    
 
    