I have some addresses that I would like to clean.
You can see that in column address1, we have some entries that are just numbers, where they should be numbers and street names like the first three rows.
df = pd.DataFrame({'address1':['15 Main Street','10 High Street','5 Other Street',np.nan,'15','12'],
'address2':['New York','LA','London','Tokyo','Grove Street','Garden Street']})
print(df)
address1 address2
0 15 Main Street New York
1 10 High Street LA
2 5 Other Street London
3 NaN Tokyo
4 15 Grove Street
5 12 Garden Street
I'm trying to create a function that will check if address1 is a number, and if so, concat address1 and street name from address2, then delete address2.
My expected output is this. We can see index 4 and 5 now have complete address1 entries:
address1 address2
0 15 Main Street New York
1 10 High Street LA
2 5 Other Street London
3 NaN Tokyo
4 15 Grove Street NaN <---
5 12 Garden Street NaN <---
What I have tried with the .apply() function:
def f(x):
try:
#if address1 is int
if isinstance(int(x['address1']), int):
# create new address using address1 + address 2
newaddress = str(x['address1']) +' '+ str(x['address2'])
# delete address2
x['address2'] = np.nan
# return newaddress to address1 column
return newadress
except:
pass
Applying the function:
df['address1'] = df.apply(f,axis=1)
However, the column address1 is now all None.
I've tried a few variations of this function but can't get it to work. Would appreciate advice.