Pandas - cross columns reference

Question

My data is a bit complicated, I separate into 2 sections: (A) Explain data, (B) Desire output

(A) - Explain data:

My data as follow:

   comp       date   adj_date  val
0     a 1999-12-31        NaT   50
1     a 2000-01-31        NaT   51
2     a 2000-02-29        NaT   52
3     a 2000-03-31        NaT   53
4     a 2000-04-30        NaT   54
5     a 2000-05-31        NaT   55
6     a 2000-06-30        NaT   56
----------------------------------
7     a 2000-07-31 2000-01-31   57
8     a 2000-08-31 2000-02-29   58
9     a 2000-09-30 2000-03-31   59
10    a 2000-10-31 2000-04-30   60
11    a 2000-11-30 2000-05-31   61
12    a 2000-12-31 2000-06-30   62
13    a 2001-01-31 2000-07-31   63
14    a 2001-02-28 2000-08-31   64
15    a 2001-03-31 2000-09-30   65
16    a 2001-04-30 2000-10-31   66
17    a 2001-05-31 2000-11-30   67
18    a 2001-06-30 2000-12-31   68
----------------------------------
19    a 2001-07-31 2001-01-31   69
20    a 2001-08-31 2001-02-28   70
21    a 2001-09-30 2001-03-31   71
22    a 2001-10-31 2001-04-30   72
23    a 2001-11-30 2001-05-31   73
24    a 2001-12-31 2001-06-30   74
25    a 2002-01-31 2001-07-31   75
26    a 2002-02-28 2001-08-31   76
27    a 2002-03-31 2001-09-30   77
28    a 2002-04-30 2001-10-31   78
29    a 2002-05-31 2001-11-30   79
30    a 2002-06-30 2001-12-31   80
----------------------------------
31    a 2002-07-31 2002-01-31   81
32    a 2002-08-31 2002-02-28   82
33    a 2002-09-30 2002-03-31   83
34    a 2002-10-31 2002-04-30   84
35    a 2002-11-30 2002-05-31   85
36    a 2002-12-31 2002-06-30   86
37    a 2003-01-31 2002-07-31   87
38    a 2003-02-28 2002-08-31   88
39    a 2003-03-31 2002-09-30   89
40    a 2003-04-30 2002-10-31   90
41    a 2003-05-31 2002-11-30   91
42    a 2003-06-30 2002-12-31   92
----------------------------------

date: is the actual date, as end of month.
adj_date = date + MonthEnd(-6)
val: is given value

I want to create new column val_new where:

it is referencing to val of previous year December
val_new is then applied to date as from date.July to date.(year+1).June, Or equivalently in adj_date it is from adj_date.Jan to adj_date.Dec

(B) - Desire Output:

   comp       date   adj_date  val  val_new
0     a 1999-12-31        NaT   50      NaN
1     a 2000-01-31        NaT   51      NaN
2     a 2000-02-29        NaT   52      NaN
3     a 2000-03-31        NaT   53      NaN
4     a 2000-04-30        NaT   54      NaN
5     a 2000-05-31        NaT   55      NaN
6     a 2000-06-30        NaT   56      NaN
-------------------------------------------
7     a 2000-07-31 2000-01-31   57     50.0
8     a 2000-08-31 2000-02-29   58     50.0
9     a 2000-09-30 2000-03-31   59     50.0
10    a 2000-10-31 2000-04-30   60     50.0
11    a 2000-11-30 2000-05-31   61     50.0
12    a 2000-12-31 2000-06-30   62     50.0
13    a 2001-01-31 2000-07-31   63     50.0
14    a 2001-02-28 2000-08-31   64     50.0
15    a 2001-03-31 2000-09-30   65     50.0
16    a 2001-04-30 2000-10-31   66     50.0
17    a 2001-05-31 2000-11-30   67     50.0
18    a 2001-06-30 2000-12-31   68     50.0
-------------------------------------------
19    a 2001-07-31 2001-01-31   69     62.0
20    a 2001-08-31 2001-02-28   70     62.0
21    a 2001-09-30 2001-03-31   71     62.0
22    a 2001-10-31 2001-04-30   72     62.0
23    a 2001-11-30 2001-05-31   73     62.0
24    a 2001-12-31 2001-06-30   74     62.0
25    a 2002-01-31 2001-07-31   75     62.0
26    a 2002-02-28 2001-08-31   76     62.0
27    a 2002-03-31 2001-09-30   77     62.0
28    a 2002-04-30 2001-10-31   78     62.0
29    a 2002-05-31 2001-11-30   79     62.0
30    a 2002-06-30 2001-12-31   80     62.0
-------------------------------------------
31    a 2002-07-31 2002-01-31   81     74.0
32    a 2002-08-31 2002-02-28   82     74.0
33    a 2002-09-30 2002-03-31   83     74.0
34    a 2002-10-31 2002-04-30   84     74.0
35    a 2002-11-30 2002-05-31   85     74.0
36    a 2002-12-31 2002-06-30   86     74.0
37    a 2003-01-31 2002-07-31   87     74.0
38    a 2003-02-28 2002-08-31   88     74.0
39    a 2003-03-31 2002-09-30   89     74.0
40    a 2003-04-30 2002-10-31   90     74.0
41    a 2003-05-31 2002-11-30   91     74.0
42    a 2003-06-30 2002-12-31   92     74.0
-------------------------------------------

I have two solutions, but both comes at a cost:

Solution 1: to create sub_dec dataframe where we take val of Dec each year. Then merge back to main data. This one works fine, but I don't like this solution because our actual data will involve a lot of merge, and it is not easy and convenient to keep track of all those merges.
Solution 2: (1) I create a lag by shift(7), (2) set other adj_date but Jan to None, (3) then use groupby with ffill. This solution works nicely, but if there is any missing rows, or the date is not continuous, then the entire output is wrong

create `adj_year`:

data['adj_year'] = data['adj_date'].dt.year

cross referencing by `shift(7)`:

data['val_new'] = data.groupby('comp')['val'].shift(7)

setting other `adj_date` except Jan to be `None`:

data.loc[data['adj_date'].dt.month != 1, 'val_new'] = None

using ffill to fill in None by each group of `['comp', 'adj_year']`:

data['val_new'] = data.groupby(['comp', 'adj_year'])['val_new'].ffill()

If you have any suggestion to overcome the drawback of Solution 02, or any other new solution is appreciated.

Thank you

score 0 · Answer 1 · answered Mar 26 '22 at 11:22

0

You can use Timedelta with correct conversion from seconds to months, according to your needs ,

check these two resources for more info:

answered Mar 26 '22 at 11:22

Lorenzo Bassetti

795
10
15

Pandas - cross columns reference

create adj_year:

cross referencing by shift(7):

setting other adj_date except Jan to be None:

using ffill to fill in None by each group of ['comp', 'adj_year']:

1 Answers1

create `adj_year`:

cross referencing by `shift(7)`:

setting other `adj_date` except Jan to be `None`:

using ffill to fill in None by each group of `['comp', 'adj_year']`: