The return column might contain numeric values, like below:
data_dict = {'return': [-1, 0, 2], 'col2': [10, 11, 12]}
data = pd.DataFrame(data)
r = data[['return']]
r.head()
for num in r:
if num >= 0:
num = 1
else:
num = 0
This gives the TypeError: '>=' not supported between instances of 'str' and 'int', I think this is because the for loop iterates through the column axis (which are strings).
I think a nice solution is to use broadcasting instead of a for loop. But it gives warnings when changing the same column:
r.loc[r['return'] >= 0,'return'] = 1
r.loc[r['return'] < 0,'return'] = 0
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
so you might create a new column:
r.loc[r['return'] >= 0, 'return2'] = 1
r.loc[r['return'] < 0, 'return2'] = 0
r['return2'] = r['return2'].astype('int')