I have a for loop that is taking a subsample of my original dataset, doing a prediction from a previously fit model, and then i need to match the target value from the original dataframe to the prediction to calculate a different value.
20 lines from original subsample:
index       f0        f1         f2          product
89641   11.758713   -2.548885   5.007187    134.766305
30665   7.134050    -7.369558   3.990141    107.813044
71148   -13.860892  -2.727111   4.995418    137.945408
63263   -1.949113   6.340399    4.999270    134.766305
34301   2.741874    -5.114227   1.990971    57.085625
28150   -9.194978   -8.220917   4.000539    110.992147
37974   5.416532    -6.685454   3.997102    107.813044
63541   8.116958    -0.106199   1.992089    53.906522
69007   -0.886114   -8.732907   3.004329    84.038886
8808    -10.138814  -5.428649   3.996867    110.992147
77082   -7.427920   -9.558472   5.002233    137.945408
30523   0.780631    -1.872719   1.000312    30.132364
78523   3.096930    -6.854314   3.000831    84.038886
66519   4.459357    -6.787551   4.994414    134.766305
69231   10.113738   -10.433003  4.004866    107.813044
48418   -17.092959  -3.294716   1.999222    57.085625
59715   -0.970615   -1.741134   2.012687    57.085625
30159   -7.075355   -16.977595  4.997697    137.945408
34763   5.850225    -5.069475   2.994821    80.859783
99239   -8.493579   -8.126316   1.004643    30.132364
code:
r2_revenue = []
for i in range(1000):
    subsample = r2_test.sample(500,replace=True)
    features = subsample.drop(['product'],axis=1)
    predict = model2.predict(features)
    top_200 = pd.Series(predict).sort_values(ascending=False).iloc[:200]
    target = subsample['product'].isin(top_200)
    result = (revenue(target).sum())
    r2_revenue.append(result)
so, my "target" needs to find the index of each top_200 entry and then find the resulting entry in the ['product'] from the original subsample.
i am striking out on finding the way to take the index number from the series top_200 and find the corresponding product value from the original dataset.
i feel like i am missing something obvious, but searches like "matching an index from a series to a value in a dataframe" are turning up results for a single dataframe, not a series to a dataframe.
if i were looking up data, i'd use a .query() but i don't know how to do that with an index to an index?
any input would be greatly appreciated!
:Edit to help clarify (hopefully):
so my series top_200 is predictions from the subsample dataframe. the index from the series should be the same as the index from the subsample dataframe. based on the index for a particular row, i want to look up a value in the product column of the subsample dataframe with the same index number.
so here is an example output for that series:
303    139.893243
203    138.886222
21     138.561583
296    138.535309
391    138.491757
the rows are 303,203,21,296 and 391. i now want to get the value in the column product from the subsample dataframe for the rows 303,203,21,296 and 391.
 
    