I have a data set with the following columns:
locID       = the location of ID of the observer
yr          = the year of the observation in categorical format: P_year
maxFlock    = a number counted by the observer
lat         = latitude of the location
long        = longitude of the location
state       = US state of the observation
effortDays  = categorical, I, II, III, and IV
effortHours = categorical, A, B, C, D 
Here is a sample of the data frame:
PData
  locID    yr  maxFlock     lat   long state effortDays effortHours 
L4278   P_2000        3   41.42 -73.67    NY        II           C
L4278   P_2000        6   41.42 -73.67    NY       III           C
L4278   P_2000        4   41.42 -73.67    NY       III           C
L4278   P_2012        2   41.42 -73.67    NY       III           B
L4278   P_2012        4   41.42 -73.67    NY        IV           B
L4278   P_2012        8   41.42 -73.67    NY        IV           B
L10494  P_2003        4   42.01 -77.44    NY        IV           C
L10494  P_2003        0   42.01 -77.44    NY        IV           C
L10494  P_2003        8   42.01 -77.44    NY        IV           D
L10494  P_2005        4   42.01 -77.44    NY        IV           C
L10494  P_2005        6   42.01 -77.44    NY        IV           C
L10494  P_2009        8   42.01 -77.44    NY        IV           C
I want to make a new column (labelled: xmf) that computes the average of maxFlock. The average, though, has to be computed for each unique combination of locID, yr, effortDays, and effortHours . If I were to run the code on the above sample, the final product would look like this.
PData
  locID     yr maxFlock         xmf        lat    long  state effortDays effortHours 
L4278   P_2000        3           3       41.42 -73.67    NY        II           C
L4278   P_2000        6           5       41.42 -73.67    NY       III           C
L4278   P_2000        4           5       41.42 -73.67    NY       III           C
L4278   P_2012        2           2       41.42 -73.67    NY       III           B
L4278   P_2012        4           6       41.42 -73.67    NY        IV           B
L4278   P_2012        8           6       41.42 -73.67    NY        IV           B
L10494  P_2003        4           2       42.01 -77.44    NY        IV           C
L10494  P_2003        0           2       42.01 -77.44    NY        IV           C
L10494  P_2003        8           8       42.01 -77.44    NY        IV           D
L10494  P_2005        4           5       42.01 -77.44    NY        IV           C
L10494  P_2005        6           5       42.01 -77.44    NY        IV           C
L10494  P_2009        8           8       42.01 -77.44    NY        IV           C
I originally tried to do this using:
PData$xmf = ave(myData2$maxFlock, myData2$locID, myData2$yr, myData2$effortDays, myData2$effortHours)
But it didn't work (had to kill it after waiting over a half hour), and I'm also not even sure if ave() can do what I want it to do.
I was thinking about trying something with the split-apply-combine methodology, but I don't think that's exactly what I'm looking for, because I would have to subset for locID, then subset for year, then for effortHours OR effortDays, and I don't want to have to make that choice. I want to do it by unique combinations.
It would also be great if there were a fast way to do this. The data I'm working with is about 2.5 million rows, so if-statements inside of for-loops are definitely not ideal.
Thank you!