I am using nba_py to get the scoreboard data for some NBA matches.
Below is an example of how the data are structured:
    SEASON |     GAME_DATE_EST | GAME_SEQUENCE | GAME_ID | HOME_TEAM_ID | VISITOR_TEAM_ID | WINNER
0   2013    2013-10-05T00:00:00     1            11300001   12321         1610612760        V
1   2013    2013-10-05T00:00:00     2            11300002   1610612754    1610612741        V
2   2013    2013-10-05T00:00:00     3            11300003   1610612745    1610612740        V
3   2013    2013-10-05T00:00:00     4            11300004   1610612747    1610612744        H
4   2013    2013-10-06T00:00:00     1            11300005   12324         1610612755        V
You can find a part of the data here: NBA Games Data.
My aim is to create and add to the original data the following columns:
For the hometeam:
   1. Total wins/losses for hometeam if hometeam plays at home ("HOMETEAM_HOME_WINS"/"HOMETEAM_HOME_LOSSES")
   2. Total wins/losses for hometeam if hometeam is visiting ("HOMETEAM_VISITOR_WINS"/"HOMETEAM_VISITOR_LOSSES")
For the visitor_team:
   3. Total wins/losses for visitor_team if visitor_team plays at home ("VISITOR_TEAM_HOME_WINS"/"VISITOR_TEAM_HOME_LOSSES")
   4. Total wins/losses for visitor_team if visitor_team is visiting ("VISITOR_TEAM_VISITOR_WINS"/"VISITOR_TEAM_VISITOR_LOSSES")
My first simplistic approach is below:
def get_home_team_home_wins(x):
    hometeam = x.HOME_TEAM_ID
    season = x.SEASON
    gid = x.name
    season_hometeam_games = grouped_seasons_hometeams.get_group((season, hometeam))
    home_games = season_hometeam_games[(season_hometeam_games.index < gid)]
    if not home_games.empty:
        try:
            home_wins = home_games.FTR.value_counts()["H"]
        except Exception as e:
            home_wins = 0
    else:
        home_wins = 0
grouped_seasons_hometeams = df.groupby(["SEASON", "HOME_TEAM_ID"])
df["HOMETEAM_HOME_WINS"] = df.apply(lambda x: get_home_team_home_wins(x), axis=1)
Another approach is iterating over the rows:
grouped_seasons = df.groupby("SEASON")
df["HOMETEAM_HOME_WINS"] = 0
current_season = 0
for index,row in df.iterrows():
    season = row.SEASON
    if season != current_season:
        current_season = season
        season_games = grouped_seasons.get_group(current_season)
    hometeam = row.HOME_TEAM_ID
    gid = row.name
    games = season_games[(season_games.index < gid)]
    home_games = games[(games.HOME_TEAM_ID == hometeam)]
    if not home_games.empty:
        try:
            home_wins = home_games.FTR.value_counts()["H"]
        except Exception as e:
            home_wins = 0
    else:
        home_wins = 0
    row["HOME_TEAM_HOME_WINS_4"] = home_wins
    df.ix[index] = row
Update 1:
Below there are functions for getting wins/losses for hometeam if it plays at home and if it visits. A similar one would be for the visitor_team.
def get_home_team_home_wins_losses(x):
    hometeam = x.HOME_TEAM_ID
    season = x.SEASON
    gid = x.name
    games = df[(df.SEASON == season) & (df.index < gid)]
    home_team_home_games = games[(games.HOME_TEAM_ID == hometeam)]  
    # HOMETEAM plays at home
    if not home_team_home_games.empty:
        home_team_home_games_value_counts = home_team_home_games.FTR.value_counts()
        try:
            home_team_home_wins = home_team_home_games_value_counts["H"]
        except Exception as e:
            home_team_home_wins = 0
        try:
            home_team_home_losses = home_team_home_games_value_counts["V"]
        except Exception as e:
            home_team_home_losses = 0
    else:
        home_team_home_wins = 0
        home_team_home_losses = 0
    return [home_team_home_wins, home_team_home_losses]
def get_home_team_visitor_wins_losses(x):
    hometeam = x.HOME_TEAM_ID
    season = x.SEASON
    gid = x.name
    games = df[(df.SEASON == season) & (df.index < gid)]
    home_team_visitor_games = games[(games.VISITOR_TEAM_ID == hometeam)]
    # HOMETEAM visits
    if not home_team_visitor_games.empty:
        home_team_visitor_games_value_counts = home_team_visitor_games.FTR.value_counts()
        try:
            home_team_visitor_wins = home_team_visitor_games_value_counts["V"]
        except Exception as e:
            home_team_visitor_wins = 0
        try:
            home_team_visitor_losses = home_team_visitor_games_value_counts["H"]
        except Exception as e:
            home_team_visitor_losses = 0
    else:
        home_team_visitor_wins = 0
        home_team_visitor_losses = 0    
    return [home_team_visitor_wins, home_team_visitor_losses]
df["HOME_TEAM_HOME_WINS"], df["HOME_TEAM_HOME_LOSSES"] = zip(*df.apply(lambda x: get_home_team_home_wins_losses(x), axis=1))
df["HOME_TEAM_VISITOR_WINS"], df["HOME_TEAM_VISITOR_LOSSES"] = zip(*df.apply(lambda x: get_home_team_visitor_wins_losses(x), axis=1))
df["HOME_TEAM_WINS"] = df["HOME_TEAM_HOME_WINS"] + df["HOME_TEAM_VISITOR_WINS"]
df["HOME_TEAM_LOSSES"] = df["HOME_TEAM_HOME_LOSSES"] + df["HOME_TEAM_VISITOR_LOSSES"]
The above approaches are not efficient. So, I am thinking of using groupby but it's not really clear how.
I will add updates whenever I find something more efficient.
Any ideas ? Thanks.
 
    