I am using the randomForest library in R via RPy2. I would like to pass back the values calculated using the caret predict method and join them to the original pandas dataframe. See example below.
import pandas as pd
import numpy as np
import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
pandas2ri.activate()
r = robjects.r
r.library("randomForest")
r.library("caret")
df = pd.DataFrame(data=np.random.rand(100, 10), columns=["a{}".format(i) for i in range(10)])
df["b"] = ['a' if x < 0.5 else 'b' for x in np.random.sample(size=100)]
train = df.ix[df.a0 < .75]
withheld = df.ix[df.a0 >= .75]
rf = r.randomForest(robjects.Formula('b ~ .'), data=train)
pr = r.predict(rf, withheld)
print pr.rx()
Which returns
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
a a b b b a a a a b a a a a a b a a a a
Levels: a b
But how can join this to the withheld dataframe or compare to the original values?
I have tried this:
import pandas.rpy.common as com
com.convert_robj(pr)
But this returns a dictionary where the keys are strings. I guess there is a work around of withheld.reset_index() and then converting the dict keys to integers and then joining the two but there must be a simpler way!