I am trying to write a machine learning algorithm where I am trying to predict whether the output will be +50000 or -50000. In doing so I am making use of 11 string features using random forest classifier. But since Random Forest Classifier requires input in the form of float/numbers, I am using DictVectorizer to convert the string features to float/numbers. But for different rows in the data, the DictVectorizer creates different number of features(240-260). This is causing an error in predicting output from the model. One sample input row is:
{'detailed household summary in household': ' Spouse of householder',
 'tax filer stat': ' Joint both under 65',
 'weeks worked in year': ' 52',
 'age': '32', 
 'sex': ' Female',
 'marital status': ' Married-civilian spouse present',
 'full or part time employment stat': ' Full-time schedules',
 'detailed household and family stat': ' Spouse of householder', 
 'education': ' Bachelors degree(BA AB BS)',
 'num persons worked for employer': ' 3',
 'major occupation code': ' Adm support including clerical'}
Is there some way I can convert the input so that I can use random forest classifier to predict the output.
Edit: The code which I am using to do so is:
    X,Y=[],[]
    features=[0,4,7,9,12,15,19,22,23,30,39]
    with open("census_income_learn.csv","r") as fl:
        reader=csv.reader(fl)
        for row in reader:
            data={}
            for i in features:
                data[columnNames[i]]=str(row[i])
            X.append(data)
            Y.append(str(row[41]))
    X_train, X_validate, Y_train, Y_validateActual = train_test_split(X, Y, test_size=0.2, random_state=32)
    vec = DictVectorizer()
    X_train=vec.fit_transform(X_train).toarray()
    X_validate=vec.fit_transform(X_validate).toarray()
    print("data ready")
    forest = RandomForestClassifier(n_estimators = 100)
    forest = forest.fit( X_train, Y_train )
    print("model created")
    Y_predicted=forest.predict(X_validate)
    print(Y_predicted)
So here if i try to print the first elements of training set and validation set, I get 252 features in X_train[0], whereas there are 249 features in X_validate[0].
 
    