I have a Pandas DataFrame that has date values stored in 2 columns in the below format:
col1: 04-APR-2018 11:04:29
col2: 2018040415203 
How could I convert this to a time stamp. Dtype of both of these columns is object.
I have a Pandas DataFrame that has date values stored in 2 columns in the below format:
col1: 04-APR-2018 11:04:29
col2: 2018040415203 
How could I convert this to a time stamp. Dtype of both of these columns is object.
 
    
     
    
    For the first format you can simply pass to_datetime, for the latter you need to explicitly describe the date format (see the table of available directives in the python docs):
In [21]: df
Out[21]:
                   col1           col2
0  04-APR-2018 11:04:29  2018040415203
In [22]: pd.to_datetime(df.col1)
Out[22]:
0   2018-04-04 11:04:29
Name: col1, dtype: datetime64[ns]
In [23]: pd.to_datetime(df.col2, format="%Y%m%d%H%M%S")
Out[23]:
0   2018-04-04 15:20:03
Name: col2, dtype: datetime64[ns]
 
    
    You can try these as well. Try passing infer_datetime_format = True while reading the file.
if the above method fails try the following
df2 = pd.to_datetime(df.col1)
or
df2 = pd.to_datetime(df['col1'])
df2
Note the above methods will only convert the str to datetime format and return them in df2. In short df2 will have only the datetime format of str without a column name for it. If you want to retain other columns of the dataframe and want to give a header to the converted column you can try the following
df['col1_converetd'] = pd.to_datetime(df.col1)
or
df['col1_converetd'] = pd.to_datetime(df['col1'])
This is comforatble if you dont want to create a dataframe or want to refer the converted column in future together with other attributes of the dataframe.
 
    
     
    
    There are a few ways to convert column values into timestamps, some more efficient than others. N.B. Passing format= to to_datetime makes the conversion much, much faster (see this post). You can find all possible combination of datetime formats at https://strftime.org/.
from datetime import datetime
x = pd.to_datetime(df['col1'], format='%d-%b-%Y %H:%M:%S')
y = df['col1'].apply(pd.Timestamp)
z = df['col1'].apply(datetime.strptime, args=('%d-%b-%Y %H:%M:%S',))
but ultimately, all produce the same object (x.equals(y) and x.equals(z) returns True) that looks like:
0   2018-04-04 11:04:29
Name: col1, dtype: datetime64[ns]
If we check the individual values, they are the same (x[0] == y[0] == z[0] returns True) that looks like
Timestamp('2018-04-04 11:04:29')
If we look at the source code, pd.Timestamp is a subclass of datetime.datetime, so all are ultimately tied by datetime.datetime.
