I have a dataframe called 'madrid' that has the measurements of air pollutants along with the date, year, month, and day. There's data from 2001 to 2018
data.frame':    3808224 obs. of  15 variables:
 $ date   : Date, format: "2001-08-01" "2001-08-01" "2001-08-01" ...
 $ BEN    : num  NA 1.5 NA NA NA ...
 $ CO     : num  0.37 0.34 0.28 0.47 0.39 ...
 $ EBE    : num  NA 1.49 NA NA NA ...
 $ NMHC   : num  NA 0.07 NA NA NA ...
 $ NO_2   : num  58.4 56.2 50.7 69.8 22.8 ...
 $ O_3    : num  34.5 42.2 46.3 40.7 66.3 ...
 $ PM10   : num  105 100.6 100.1 69.8 75.2 ...
 $ SO_2   : num  6.34 8.11 7.85 6.46 8.8 ...
 $ TCH    : num  NA 1.24 NA NA NA ...
 $ TOL    : num  NA 10.8 NA NA NA ...
 $ station: int  28079001 28079035 28079003 28079004 28079039 28079006 28079007 28079009 28079038 28079011 ...
 $ year   : num  2001 2001 2001 2001 2001 ...
 $ month  : num  8 8 8 8 8 8 8 8 8 8 ...
 $ day    : int  1 1 1 1 1 1 1 1 1 1 ...
and I'm now trying to create a plot that shows the monthly total average of each pollutant. So, ideally one line/curve with the months plotted on the x-axis to see if there's seasonality. So far, I've done this (below) with dplyr:
madrid_season <- madrid %>%
  group_by(month, year) %>%
  summarise(BEN_mean = mean(BEN, na.rm = TRUE),
            CO_mean = mean(CO, na.rm = TRUE),
            EBE_mean = mean(EBE, na.rm = TRUE),
            NMHC_mean = mean(NMHC, na.rm = TRUE),
            NO_2_mean = mean(NO_2, na.rm = TRUE),
            O_3_mean = mean(O_3, na.rm = TRUE),
            PM10_mean = mean(PM10, na.rm = TRUE),
            SO_2_mean = mean(SO_2, na.rm = TRUE),
            TCH_mean = mean(TCH, na.rm = TRUE),
            TOL_mean = mean(TOL, na.rm = TRUE)) 
and then I sum the averages (excluding the month and year column):
madrid_season$tot <- rowSums(madrid_season[,2:11], na.rm = TRUE)
and then I try and plot:
p4 <- ggplot(madrid_season, aes(x=month,y=tot))+geom_point()+geom_line()
except this plots the total monthly average for every year, instead of for the single month for each year, so I end up with a wacky looking plot.
I think I need help using dplyr in this case (or open to other suggestions!)
 
    

