I have a dataframe with two columns. How can I split according column "id" in a 70/30 ratio randomly. So with id 7 despite 3 occurring values it only counts as 1/10 with ratio.
How to split data into 3 sets (train, validation and test)? Does not help in this case.
import pandas as pd
d = {'id': [1,2,3,3,4,5,6,7,7,7,8,9,10,10], 'col2': [3,4,5,7,8,9,1,5,9,10,11,4,1,7]}
df = pd.DataFrame(data=d)
So possible output df1_30 would be:
>>> df1_30
     id   col2
0    1    3
2    3    5
3    3    7
11   9    4
Another possible output of df1_30 could be also (just for clarification):
>>> df1_30
     id   col2
0    1    3
10   8    11
11   9    4
 
    