I want to plot foo ~ bar. However, I don't want to look at the exact data, I'd rather break bar into say quantiles, and plot mean(foo) for every quantile (so my final plot will have 5 data points). Is this possible?
            Asked
            
        
        
            Active
            
        
            Viewed 1,930 times
        
    1
            
            
         
    
    
        Xodarap
        
- 11,581
- 11
- 56
- 94
- 
                    Have you looked at `?quantile`? – Brandon Bertelsen Mar 31 '13 at 18:28
- 
                    Hi there! Please make your post reproducible by having a look at [**How to make a great reproducible example**](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for us to help you. Thank you. – Arun Mar 31 '13 at 18:36
2 Answers
6
             foo <- rnorm(100)
 bar <- rnorm(100)
  mn.foo.byQ10bar <- tapply(foo, cut(bar, quantile(bar, (0:5)/5, na.rm=TRUE)), mean)
> mn.foo.byQ5bar
 (-3.31,-0.972] (-0.972,-0.343]  (-0.343,0.317]   (0.317,0.792]    (0.792,2.71] 
     0.13977839      0.03281258     -0.18243804     -0.14242885     -0.01696712 
 plot(mn.foo.byQ5bar)
This is a fairly standard task and Harrell's Hmisc package's cut2 function has a nice gr= argument that lets you do this by just specifying an integer for the number of groups. I also like it because the intervals from the cut operation are left-closed instead of R default for right-closed.
 
    
    
        IRTFM
        
- 258,963
- 21
- 364
- 487
- 
                    1DWin, could you briefly tell (if you're aware of) what's the idea of such a plot? Plotting quantiles against mean of data within each quantile range. I can't think of the idea behind it... – Arun Mar 31 '13 at 18:48
- 
                    4Some people seem to like barplots more than scatterplots. (I'm not in that category and if the numbers of observations exceeds a few hundred would use a hexbin plot or a 2d density plot for continuous by continuous comparisons.) This barplot approach might support visualizing a 4 degree of freedom chi-square GOF test (again not what I would advice), where the independent variable was grouped by quintile and the corresponding foo-means were bar-heights. – IRTFM Mar 31 '13 at 19:01
5
            
            
        You can combine a lot of these lines into more concise code, but here it is broken down
# Sample Data: 
x <- 1:100;   y <- rnorm(x)
# Number Of Groups
N <- 5
# quantiles
Q.y <- quantile(y, probs=seq(0, 1, length=(N+1)))
Q.x <- quantile(x, probs=seq(0, 1, length=N))
# means of y by quantile
means.y <- c(by(y, cut(y, Q.y), mean))
# plot them 
qplot(Q.x, means.y)
 
    
    
        Ricardo Saporta
        
- 54,400
- 17
- 144
- 178