I need to split a large dataframe of meterological timeseries into a training and validation samples. It contains data from multiple stations, which have varying period of observations. How could I divide it so that proportion of training and validation observations is equal across each station. Given the following dataset:
| Station | Date | temp | 
|---|---|---|
| A | 2012-01-01 | -0.8 | 
| A | 2012-01-02 | 0.1 | 
| A | 2012-01-03 | 0.5 | 
| A | 2012-01-04 | 0.4 | 
| B | 2012-01-01 | 0.1 | 
| B | 2012-01-02 | 0.5 | 
and assuming that the training set should include only first 50% of the observations from each station, the desired output would be:
| Station | Date | temp | 
|---|---|---|
| A | 2012-01-01 | -0.8 | 
| A | 2012-01-02 | 0.1 | 
| B | 2012-01-01 | 0.1 | 
 
    