I am trying to bucket certain features into groups. The data.frame below (grouped) is my "key" (think Excel vlookup):
          Original  Grouped
1         Features Constant
2     PhoneService Constant
3    PhoneServices Constant
4       Surcharges Constant
5     CallingPlans Constant
6            Taxes Constant
7          LDUsage    Noise
8    RegionalUsage    Noise
9       LocalUsage    Noise
10       Late fees    Noise
11 SpecialServices    Noise
12         TFUsage    Noise
13       VoipUsage    Noise
14         CCUsage    Noise
15         Credits  Credits
16         OneTime  OneTime
I then reference my database which has a column (BillSection) that takes on a specific value from grouped$Original, and I want to group it according to grouped$Grouped. I am using the sapply function to perform this operation. Then I cbind the resulting output to my original data.frame.
grouper<-as.character(sapply(as.character(bill.data$BillSection[1:100]), # for the first 100 records of the data.frame bill.data
       function(x)grouped[grouped$Original==x,2])) # take the second column, i.e. Grouped, for the corresponding "TRUE" value in Original
cbind(bill.data[1:100,],as.data.frame(grouper))
The above code works as expected, but it's slow when I apply it to my whole database, which exceeds 10,000,000 unique records. Is there an alternative to this method? I know I can use plyr, but it's even slower (I think) than sapply. I was trying to figure it out with data.table but no luck. Any suggestions would be helpful. I am open to coding this in Python, which I am new to, but heard is much faster than R, since I am dealing with large datasets very often. I wanted to know if R can do this fast enough to be useful.
Thanks!
 
    