I am using the iris data set from sklearn. I need to split the data, sample the training set without repetition based on the proportions, apply a Naive Bayes Classifier, record score and return a dictionary that maps the sample size (key) used to fit the model to the corresponding score (training and test scores as a tuple)
I need some help with the returning dictionary part. This is what I have done to get the required dictionary. I am unsure if what I have done is correct or if there is a better way to do this.
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.naive_bayes import MultinomialNB
score_list=shape_list=[]
iris = load_iris()
props=[0.2,0.5,0.7,0.9]
df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                    columns= iris['feature_names'] + ['target'])
y=df[list(df.loc[:,df.columns.values =='target'])]
X=df[list(df.loc[:,df.columns.values !='target'])]
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3
                                       ,train_size=0.7)
for i in props:
    ix = np.random.choice(X_train.index, size=int(i*len(X_train)), replace = False)
    sampleX = X_train.loc[ix]
    sampleY = y_train.loc[ix]
    modelNB = MultinomialNB()
    modelNB.fit(sampleX, sampleY)
    train_score=modelNB.score(sampleX,sampleY)
    test_score=modelNB.score(X_test,y_test)
    score_list.append((train_score , test_score))
    shape_list.append(sampleX.shape[0])
print(dict(zip(shape_list,score_list)))
 
     
     
    
