I am confused by the behavior of pandas to_datetime on non-American dates.
In this trivial example, Pandas correctly infers the months on both the 2nd and 3rd row, but fails on the 1st and 4th.
Apparently it is treating the 2nd and 3rd line as dd/mm/yyyy dates (because the 13 and 27 clearly can't be months), but treating the remaining dates as being mm/dd/yyyy.
My expectation would be that to_datetime would infer from the entire series, and then treat every entry the same.
import pandas as pd
results = pd.DataFrame()
european_dates = pd.Series(['05/04/2007', # <-- April 5th, 2007
'13/04/2006', # <-- April 13th, 2006
'27/12/2014', # <-- December 27th, 2014
'02/07/2010']) # <-- July 2nd, 2010
# note: the same happens with infer_datetime_format=False
inferred_dates = pd.to_datetime(european_dates,
infer_datetime_format=True)
results['day'] = inferred_dates.dt.day
results['month'] = inferred_dates.dt.month
results['year'] = inferred_dates.dt.year
results
Note: I'm aware that to_datetime has a dayfirst argument, and a format argument, my question is mostly about why the infer_datetime_format fails in such a trivial case.
