Here I make a new column to indicate whether myData is above or below its median
### MedianSplits based on Whole Data
#create some test data
myDataFrame=data.frame(myData=runif(15),myFactor=rep(c("A","B","C"),5)) 
#create column showing median split
myBreaks= quantile(myDataFrame$myData,c(0,.5,1))
myDataFrame$MedianSplitWholeData = cut(
    myDataFrame$myData,
    breaks=myBreaks, 
    include.lowest=TRUE,
    labels=c("Below","Above"))
#Check if it's correct
myDataFrame$AboveWholeMedian = myDataFrame$myData > median(myDataFrame$myData)
myDataFrame
Works fine. Now I want to do the same thing, but compute the median splits within each level of myFactor.
I've come up with this:
#Median splits within factor levels
byOutput=by(myDataFrame$myData,myDataFrame$myFactor, function (x) {
     myBreaks= quantile(x,c(0,.5,1))
     MedianSplitByGroup=cut(x,
       breaks=myBreaks, 
       include.lowest=TRUE,
       labels=c("Below","Above"))
     MedianSplitByGroup
     })
byOutput contains what I want. It categorizes each element of factors A, B, and C correctly. However I'd like to create a new column, myDataFrame$FactorLevelMedianSplit, that shows the newly-computed median split.
How do you convert the output of the "by" command into a useful data-frame column?
I think perhaps the "by" command is not R-like way to do this ...
Update:
With Thierry's example of how to use factor() cleverly, and upon discovering the "ave" function in Spector's book, I've found this solution, which requires no additional packages.
myDataFrame$MediansByFactor=ave(
    myDataFrame$myData,
    myDataFrame$myFactor,
    FUN=median)
myDataFrame$FactorLevelMedianSplit = factor(
    myDataFrame$myData>myDataFrame$MediansByFactor, 
    levels = c(TRUE, FALSE), 
    labels = c("Above", "Below"))
 
     
     
     
     
    