2 versions of the solution, slow and fast for a len(df) = 3300000
Slow:
%%time
d = 1
for i,v in df.iterrows():
if (v.flag == 1) and (d<5) :
df.at[i,'flag1'] = 0
d+=1
elif (v.flag == 1):
df.at[i,'flag1'] = 1
d=1
else:
df.at[i,'flag1'] = 0
d=1
df['flag2']=df['flag1'].astype(int)
Wall time: 4min 27s
Fast:
%%time
from math import floor
d = 1
df['flag1'] = (
[(0,(d:=1))[0] if df.at[i,'flag']==0
else (0, (d := d+1))[0] if (d%5)!=0
else (1, (d :=1 ))[0]
for i in range(len(df))
] )
Wall time: 1min 1s
Ignore the "new" column.
|
flag |
flag1 |
flag2 |
new |
| 0 |
0 |
0 |
0 |
0 |
| 1 |
0 |
0 |
0 |
0 |
| 2 |
1 |
0 |
0 |
0 |
| 3 |
1 |
0 |
0 |
0 |
| 4 |
1 |
0 |
0 |
0 |
| 5 |
1 |
0 |
0 |
0 |
| 6 |
1 |
1 |
1 |
1 |
| 7 |
1 |
0 |
0 |
0 |
| 8 |
1 |
0 |
0 |
0 |
| 9 |
0 |
0 |
0 |
0 |
| 10 |
0 |
0 |
0 |
0 |
| 11 |
0 |
0 |
0 |
0 |
| 12 |
1 |
0 |
0 |
0 |
| 13 |
1 |
0 |
0 |
0 |
| 14 |
1 |
0 |
0 |
0 |
| 15 |
1 |
0 |
0 |
0 |
| 16 |
1 |
1 |
1 |
1 |
| 17 |
1 |
0 |
0 |
0 |
| 18 |
1 |
0 |
0 |
0 |
| 19 |
1 |
0 |
0 |
0 |
| 20 |
1 |
0 |
0 |
0 |
| 21 |
1 |
1 |
1 |
0 |
| 22 |
1 |
0 |
0 |
0 |
| 23 |
1 |
0 |
0 |
0 |
| 24 |
1 |
0 |
0 |
0 |
| 25 |
0 |
0 |
0 |
0 |
| 26 |
0 |
0 |
0 |
0 |
| 27 |
1 |
0 |
0 |
0 |
| 28 |
0 |
0 |
0 |
0 |
| 29 |
1 |
0 |
0 |
0 |
| 30 |
1 |
0 |
0 |
0 |
| 31 |
1 |
0 |
0 |
0 |
| 32 |
1 |
0 |
0 |
0 |
| 33 |
0 |
0 |
0 |
0 |
| 34 |
0 |
0 |
0 |
0 |
| 35 |
1 |
0 |
0 |
0 |
| 36 |
1 |
0 |
0 |
0 |
| 37 |
1 |
0 |
0 |
0 |
| 38 |
1 |
0 |
0 |
0 |
| 39 |
1 |
1 |
1 |
1 |
| 40 |
1 |
0 |
0 |
0 |
| 41 |
1 |
0 |
0 |
0 |
| 42 |
0 |
0 |
0 |
0 |
| 43 |
0 |
0 |
0 |
0 |
| 44 |
0 |
0 |
0 |
0 |
| 45 |
1 |
0 |
0 |
0 |
| 46 |
1 |
0 |
0 |
0 |
| 47 |
1 |
0 |
0 |
0 |
| 48 |
1 |
0 |
0 |
0 |
| 49 |
1 |
1 |
1 |
1 |
For testing purpose, this is how I generated the data:
A = [0,0,1,1,1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,0,1,1,1,1]
A = A * 100000
df=pd.DataFrame({'flag':A})