I have 2 DFs that have different sizes, something like:
DF1:
Metric Market BSegment MSegment Category Num Den Period
 KPI1   GA     PS       Comm      BF     20  100 202201
 KPI1   CE     PR       Cons      BF     70  128 202201
 KPI2   NWE    PR       Cons      On     12  389 202203
  .
  .
  .
 KPI20  ISE    PS       Cons      On     80  288 202204
 KPI30  GA     PS       Comm      BF     32  85  202204
DF2:
Metric Market BSegment MSegment Category StartPeriod EndPeriod Goal
 KPI1   *      PS       Comm      BF        202201     202205   5
 KPI1   *      PR       Cons      BF        202201     202205   10
 KPI2   NWE    PR        *        On        202112     202203   60
  .
  .
  .
 KPI2   ISE    PS        *        On        202112     202203   40
 KPI3   GA     PS       Comm      BF        202201     202206   22
 KPI3   CE     PS       Comm      BF        202201     202206   26
 KPI3   NWE    PS       Comm      BF        202201     202206   20
I am trying to get a new column in DF1 where I bring the values of the 'Goal' column in DF2 when the other columns are matchig. The main problem is that in DF2, in order to not repeat the same line multiple times, when a goal is applied, for example, to all markets instead of a line per market it states '*'.
Also there could be metrics in DF1 that are not appearing in DF2 as there are no goals.
The logic would be something like:
(df1['Metric'] == df2['Metric']) &
(df1['Market'] == df2['Market'] | df2['Market'] == '*') & 
(df1['BSegment'] == df2['BSegment'] | df2['BSegment'] == '*') &
(df1['MSegment'] == df2['MSegment'] | df2['MSegment'] == '*') &
(df1['Category'] == df2['Category'] | df2['Category'] == '*') &
(df2['StartPeriod'] <= df1['Period'] <= df2['EndPeriod'] )
I've tried with np.where and np.select as:
df1['Numerator.Goal'] = (np.where((df1['Metric'] == df2['Metric']) &
(df1['Market'] == df2['Market'] | df2['Market'] == '*') & 
(df1['BSegment'] == df2['BSegment'] | df2['BSegment'] == '*') &
(df1['MSegment'] == df2['MSegment'] | df2['MSegment'] == '*') &
(df1['Category'] == df2['Category'] | df2['Category'] == '*') &
(df2['StartPeriod'] <= df1['Period'] <= df2['EndPeriod'] ), df2['Goal'], np.nan))
--------
df1['Numerator.Goal'] = (np.select((df1['Metric'] == df2['Metric']) &
(df1['Market'] == df2['Market'] | df2['Market'] == '*') & 
(df1['BSegment'] == df2['BSegment'] | df2['BSegment'] == '*') &
(df1['MSegment'] == df2['MSegment'] | df2['MSegment'] == '*') &
(df1['Category'] == df2['Category'] | df2['Category'] == '*') &
(df2['StartPeriod'] <= df1['Period'] <= df2['EndPeriod'] ), df2['Goal']))
but it ends up with a "ValueError: Can only compare identically-labeled Series objects"
I've think also on doing a .merge() but I havn't seen how to deal with the '*' symbol so I have not been successful at it.
By any chance, would you know how to accomplish this? Any help is appreciated!
 
     
    