I have a huge panel data, more than 10 GB. The data looks like the following:
ID    Start_time  Factor  End_time
1001     1611      0.12   1861
1001     1612     -0.01   1862
...
1001     1861      0.57   2111
1001     1862      0.06   2112
...
1002     1200      0.01   1450
1002     1201      0.52   1451
...
1002     1450     -0.21   1700
1002     1451      0.30   1701
...
The data is sorted by ID and Start_time value. I want to calculate the sum of Factor that for each ID from Start_time to the corresponding End_time.
An example of output:
ID    Start_time  Factor  End_time  Cumulative_factor
1001     1611      0.12   1861      0.12+(-0.01)+...+0.57
1001     1612     -0.01   1862      -0.01+...+0.57+0.06
...
1001     1861      0.57   2111      0.57+0.06+...
1001     1862      0.06   2112      0.06+...
...
1002     1200      0.01   1450      0.01+0.52+...+(-0.21)
1002     1201      0.52   1451      0.52+...+(-0.21)+0.30
...
1002     1450     -0.21   1700      -0.21+0.30+...
1002     1451      0.30   1701      0.30+...
...
Since I have more than 10 million observation, Is there a efficient way to calculate it?
 
     
    