(update: added desired data frame)
Let me start by saying that I'm reasonably confident that I found a solution to this problem several years ago, but I have not been able to re-find that solution.
Questions that address similar problems, but don't solve my particular problem include:
- Efficiently select rows that match one of several values in Pandas DataFrame
- Efficiently adding calculated rows based on index values to a pandas DataFrame
- Compare Python Pandas DataFrames for matching rows
The Question
Let's say I have a data frame with many columns that I am working on:
big = pd.DataFrame({'match_1': [11, 12, 51, 52]})
big
   match_1
0       11
1       12
2       51
3       52
I also have smaller data frame that, in theory, maps some conditional statement to a desired value:
# A smaller dataframe that we use to map values into the larger dataframe
small = pd.DataFrame({'is_even': [True, False], 'score': [10, 200]})
small
  is_even  score
0    True     10
1   False    200
The goal here would be to use a conditional statement to match each row in big to a single row in small.  Assume that small is constructed such that there was always be one, and only one, match for each row in big.  (If there has to be multiple rows in small that match, just pick the first one.)
The desired output would be something like:
desired = pd.DataFrame({'match_1': [11, 12, 51, 52], 'metric': [200, 10, 200, 10]})
desired
 match_1  metric
0       11     200
1       12      10
2       51     200
3       52      10
I'm pretty sure that the syntax would look similar to:
big['score'] = small.loc[small['is_even'] == ( (big['match_1'] / 2) == 0), 'score']
This won't work because small['is_even'] is a Series of length 2, while ( (big['match_1'] / 2) == 0) is a Series of length 4.  What I'm looking to do is, for each row in big, find the one row in small that matches based on a conditional.
If I can get a sequence that contains the correct row in small that matches each row in big, then I could do something like:
`big['score'] = small.loc[matching_rows, 'score']
The question I have is: how do I generate the Sequence matching rows?
Things that (I think) aren't quite what I want:
If the columns in big and small were to match simply on constant values, this would be a straight forward use of either big.merge() or big.groupby(), however, in my case, the mapping can be an arbitrarily complex boolean conditional, for example:
(big['val1'] > small['threshold']) & (big['val2'] == small['val2']) & (big['val3'] > small['min_val']) & (big['val3'] < small['max_val'])
Solutions that rely on isin(), any(), etc, don't work, because the conditional check can be arbitrarily complex.
I could certainly create a function to apply() to the bigger DataFrame, but again, I'm pretty sure there was a simpler solution.
The answer may come down to 'calculate some intermediate columns until you can do a simple merge' or 'just use apply(), but I could swear that there was a way to do what I've described above.
 
     
     
    