I want to use R to sample my dataframe. My data is timestamped epidemiological data, and I want to randomly sample at least 1 and as many as 10 records for each year, preferably in a manner that is scaled to the number of records for each year. I would like to export the results as a csv.
here are a few lines of my dataset, where I've left off the long genetic sequence field for each record.
year    matrix  USD clade  
1958    W   mG018U  UP  
1958    W   mG018U  UP  
1958    W   mG018U  UP  
1966    UN  mG140L  LL  
1969    UN  mG207L  LL  
1969    UN  mG013L  LL  
1971    UN  mG208L  LL  
1972    HA  mG129M  MN  
1973    C1  mG018U  UP  
1973    NA  mG001U  UC  
1973    NA  mG001U  UC
all I've learned to do is
sample(mydata, size = 600, replace = FALSE)
which doesn't of course take the year into account.
 
     
    