I have a DataFrame containing columns that overlap in a sense:
import pandas as pd
df = pd.DataFrame({
    'Date': ['2017-04-02', '2017-04-02', '2017-04-02', '2017-04-02', '2017-04-02', '2017-04-02'],
    'Team': ['CHC', 'ARI', 'NYY', 'TBR', 'STL', 'SFG'],
    'Home': [True, False, True, False, False, True],
    'Opp': ['STL', 'SFG', 'TBR', 'NYY', 'CHC', 'ARI'],
    'Rslt': ['L', 'W', 'L', 'W', 'W', 'L']
})
df['Date'] = pd.to_datetime(df['Date'])
print(df)
OUTPUT:
        Date Team   Home  Opp Rslt
0 2017-04-02  CHC   True  STL    L
1 2017-04-02  ARI  False  SFG    W
2 2017-04-02  NYY   True  TBR    L
3 2017-04-02  TBR  False  NYY    W
4 2017-04-02  STL  False  CHC    W
5 2017-04-02  SFG   True  ARI    L
For the date of 2017-04-01, there were 3 games played. The DataFrame contains the game results for each day for each team. This results in 6 results. Take row 2 and 3, this is a game between NYY and TBR:
- Row 2gives the NYY result ofL, meaning they lost
- Row 3gives the TBR result ofW, meaning they won
What I'm trying to do is group all row pairs that relate to the same game. My initial idea was to create a new column that would act as a label for the pair and then use that to group on or set MultiIndex. I thought about it and considered concatenating the three columns into a single string for each row and then, using sets, look through all rows for each date in Date and find the other row that contains the same characters:
df['Match'] = df['Date'].dt.strftime('%Y-%m-%d') + ',' + df['Team'] + ',' + df['Opp']
print(df)
OUTPUT:
        Date Team   Home  Opp Rslt               Match
0 2017-04-02  CHC   True  STL    L  2017-04-02,CHC,STL
1 2017-04-02  ARI  False  SFG    W  2017-04-02,ARI,SFG
2 2017-04-02  NYY   True  TBR    L  2017-04-02,NYY,TBR
3 2017-04-02  TBR  False  NYY    W  2017-04-02,TBR,NYY
4 2017-04-02  STL  False  CHC    W  2017-04-02,STL,CHC
5 2017-04-02  SFG   True  ARI    L  2017-04-02,SFG,ARI
From here, I'm not sure how to proceed. I have a method in mind using sets that I've used in the past. If we focus on row 2 and 3 again, subtracting the sets of the string, split using the ,, and taking the bool() will return False for two sets containing the same string elements and True for anything else (different sets):
print(
    bool(set('2017-04-02,NYY,TBR'.split(',')) - set('2017-04-02,TBR,NYY'.split(',')))
)
print(
    bool(set('2017-04-02,NYY,TBR'.split(',')) - set('2017-04-02,CHC,STL'.split(',')))
)
OUTPUT:
False
True
Is there a better way to take a row value in a column and lookup all other row values in that same column and label the rows where they are related? The kind of label I would like to have is creating a unique numbering of games. Since these three games happen on the same day, labelling the pairs as 1, 2, 3 would be great so that each game pair for each day has a unique ID.
P.S. I've also seen this post that kinda looks like what I'm trying to do... I've tried using .isin() but kept running into errors so scrapped that approach. I thought about pd.DataFrame.lookup but I'm not quite sure if that's the right approach either. Just need a way to group up each pair of rows.
 
     
     
    
