i´m currently working with a large dataframe of 75 columns and round about 9500 rows. This dataframe contains observations for every day from 1995-2019 for several observation points.
Edit: The print from dput(head(df))
> dput(head(df))
structure(list(date = structure(c(9131, 9132, 9133, 9134, 9135, 
9136), class = "Date"), x1 = c(50.75, 62.625, 57.25, 56.571, 
36.75, 39.125), x2 = c(62.25, 58.714, 49.875, 56.375, 43.25, 
41.625), x3 = c(90.25, NA, 70.125, 75.75, 83.286, 98.5), 
    x4 = c(60, 72, 68.375, 65.5, 63.25, 55.875), x5 = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), xn = c(53.25, 
    61.143, 56.571, 58.571, 36.25, 44.375), year = c(1995, 1995, 1995, 1995, 
    1995, 1995), month = c(1, 1, 1, 1, 1, 1), day = c(1, 2, 3, 
    4, 5, 6)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", 
"data.frame"))
The dataframe looks like this sample from it:
date             x1      x2     x3       x4       x5     xn     year    month    day
  <date>       <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 1995-01-01    50.8    62.2    90.2    60        NA    53.2    1995      1    1
2 1999-08-02    62.6    58.7    NA      72        NA    61.1    1999      8    2
3 2001-09-03    57.2    49.9    70.1    68.4      NA    56.6    2001      9    3
4 2008-05-04    56.6    56.4    75.8    65.5      NA    58.6    2008      5    4
5 2012-04-05    36.8    43.2    83.3    63.2      NA    36.2    2012      4    5
6 2019-12-31    39.1    41.6    98.5    55.9      NA    44.4    2019      12   31
str(df)
tibble [9,131 x 75] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ date   : Date[1:9131], format: "1995-01-01" "1995-01-02" ...
 $ x1     : num [1:9131] 50.8 62.6 57.2 56.6 36.8 ...
 $ x2     : num [1:9131] 62.2 58.7 49.9 56.4 43.2 ...
   xn
 $ year   : num [1:9131] 1995 1995 1995 1995 1995 ...
 $ month  : num [1:9131] 1 1 1 1 1 1 1 1 1 1 ...
 $ day    : num [1:9131] 1 2 3 4 5 6 7 8 9 10 ...
My goal is to get for every observation point xn the count of all observations which cross a certain limit per year. So far i tried to reach this with the Aggregate function.
To get the mean of every year i used the following command:
aggregate(list(df), by=list(year=df$year), mean, na.rm=TRUE)
this works perfect, i get the mean for every year for every observation point.
To get the sum of one station i used the following code
aggregate(list(x1=df$x1), by=list(year=df$year), function(x) sum(rle(x)$values>120, na.rm=TRUE))
which results in this print:
   year      x1
1  1995      52
2  1996      43
3  1997      44
4  1998      42
5  1999      38
6  2000      76
7  2001      52
8  2002      58
9  2003     110
10 2004      34
11 2005      64
12 2006      46
13 2007      46
14 2008      17
15 2009      41
16 2010      30
17 2011      40
18 2012      47
19 2013      40
20 2014      21
21 2015      56
22 2016      27
23 2017      45
24 2018      22
25 2019      45
So far, so good. I know i could expand the code by adding (..,x2=data$x2, x3=data$x3,..xn) to the list argument in code above. which i tried and they work.
But how do I get them all at once?
I tried the following codes:
aggregate(.~(date, year, month, day), by=list(year=df$year), function(x) sum(rle(x)$values>120, na.rm=TRUE))
Fehler: Unerwartete(s) ',' in "aggregate(.~(date,"
aggregate(.~date+year+month+day, by=list(year=df$year), function(x) sum(rle(x)$values>120, na.rm=TRUE))
Fehler in as.data.frame.default(data, optional = TRUE) : 
  cannot coerce class ‘"function"’ to a data.frame
aggregate(. ~ date + year + month + day, data = df,by=list(year=df$year), function(x) sum(rle(x)$values>120, na.rm=TRUE))
Fehler in aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...) : 
  Argumente müssen dieselbe Länge haben
But unfortunately none of them works. Could someone please give me a hint where my mistake is?
 
     
    