I am doing some operations on a pandas dataframe, specifically:
- Dropping a column
- Using the dataframe.apply()function to add a column based on an existing one
Here's the simplest test-case I've been able to create:
import pandas as pd
df = pd.DataFrame(
    [["Fred", 1, 44],
     ["Wilma", 0, 39],
     ["Barney", 1, None]],
    columns=["Name", "IntegerColumn", "Age" ])
def translate_age(series):
    if not np.isnan(series['Age']):    
        series["AgeText"] = "Over 40" if series["Age"] > 40 else "Under 40"
    else:
        series["AgeText"]  = "Unknown"
    return series
    
df = df.drop('Name', axis=1)
print('@ before', df['IntegerColumn'].dtypes)
df = df.apply(func=translate_age, axis=1)
print('@ after', df['IntegerColumn'].dtypes)
The print() output shows the change in the IntegerColumn's type. It started as an integer:
@ before int64
... and then after the apply() call, it changes to a float:
@ after float64
Initially, the dataframe looks like this:
     Name  IntegerColumn   Age
0    Fred              1  44.0
1   Wilma              0  39.0
2  Barney              1   NaN
... after the apply() call, it looks like this:
   IntegerColumn   Age   AgeText
0            1.0  44.0   Over 40
1            0.0  39.0  Under 40
2            1.0   NaN   Unknown
Why is the IntegerColumn changing from an integer to a float in this case? And how can I stop it from doing so?
 
    