I have a data set which includes seller ID, and number of the total sold TV in each year. An example of this data frame is shown below:
ID <- c(1006332,1010660,1010852,1012960,1012960,1012960,1012960,1012960,1013515,1013515,1013515,1013515,101351)
Sold_year <- c(2017,2016,2011,2011,2012,2013,2015,2016,2014,2015,2016,2018,2019)
n <- c(1,1,1,12,1,3,4,1,2,1,3,1,2)
data <- data.frame(ID,Sold_year,n)
        ID Sold_year  n
1  1006332      2017  1
2  1010660      2016  1
3  1010852      2011  1
4  1012960      2011 12
5  1012960      2012  1
6  1012960      2013  3
7  1012960      2015  4
8  1012960      2016  1
9  1013515      2014  2
10 1013515      2015  1
11 1013515      2016  3
12 1013515      2018  1
13 1013515      2019  2
what I want to do: I want to calculate the average gradient of sold TV for each seller. So for example: for the seller 1013515, I want to do the following calculation:
grad1 = (1-2)/(2015-2014)=-1
grad2 = (3-1)/(2016-2015)= 2
grad3 = (1-3)/(2018-2016)=-1
grad4 = (2-1)/(2019-2018)= 1
average= -1+2-1+1/(4)=0.25
Also, for a seller who has sold a TV only in one year, I consider the gradient as 0 and will add a penalty which is the difference between the sold year and 2020. So for seller 1006332 this average gradient would be
0-(2020-2017)=-3
What I did so far: in order to do the following calculation, I used a for loop to go through a list of specific agent each time and calculate the gradient. This approach works and give the right result, however, it's slow for large datasets. I was thinking maybe there is a way to do it simply with dplyr and data frame calculations. I was thinking to use lag function and here is what I proposed:
test2 <- data %>% 
  group_by(ID) %>% 
  mutate(grad=(n-lag(n))/(Sold_year-lag(Sold_year)))
the result of this code is:
 1                    1006332      2017     1  NA  
 2                    1010660      2016     1  NA  
 3                    1010852      2011     1  NA  
 4                    1012960      2011    12  NA  
 5                    1012960      2012     1 -11  
 6                    1012960      2013     3   2  
 7                    1012960      2015     4   0.5
 8                    1012960      2016     1  -3  
 9                    1013515      2014     2  NA  
10                    1013515      2015     1  -1  
11                    1013515      2016     3   2  
12                    1013515      2018     1  -1  
13                    1013515      2019     2   1  
As you see, it successfully calculates the gradient for the seller with more than one working year; however, I am not sure how I should calculate the final average gradient and also how I should add a penalty to those sellers with one working year(seller : 1006332, 1010660, etc)
the expected output should be in this form:
                                      ID                       Ave_grad
1                                        1006332                      -3.00
2                                        1010660                      -4.00
3                                        1010852                      -9.00
4                                        1012960                      -2.87
5                                        1013515                       0.25
any suggestions or comments?
Thanks
 
    