I have a column of duration but the values are different. Some duration are only time format and some have mixed with date. I want the duration column in total seconds. I tried to convert the column with to_datetime and parse_date methods but it cannot work. How to do this in pandas? Here is the column:
            Asked
            
        
        
            Active
            
        
            Viewed 1,268 times
        
    0
            
            
        - 
                    2"but it cannot work..". Show your code that didn't work. – Austin Apr 17 '18 at 11:08
- 
                    1Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. – jezrael Apr 17 '18 at 11:11
3 Answers
0
            
            
        One way is to use pd.Series.apply with a try / except clause, which attempts each method sequentially.
The benefit of this method is that it will accept a wide range of potential inputs for timedelta and datetime.
import pandas as pd, numpy as np
df = pd.DataFrame({'Mixed': ['03:59:49', '1904-01-01 04:06:08']})
def return_seconds(x):
    try:
        return pd.to_timedelta(x).total_seconds()
    except:
        try:
            dt = pd.to_datetime(x)
            return (dt - dt.normalize()).total_seconds()
        except:
            return np.nan
df['TotalSeconds'] = df['Mixed'].apply(return_seconds).astype(int)
print(df)
#                  Mixed  TotalSeconds
# 0             03:59:49         14389
# 1  1904-01-01 04:06:08         14768
 
    
    
        jpp
        
- 159,742
- 34
- 281
- 339
0
            
            
        Filter last 8 values, convert to_timedelta and then use total_seconds:
df = pd.DataFrame({'col':['03:59:49', '1904-01-01 04:06:08']})
df['new'] = pd.to_timedelta(df['col'].str[-8:]).dt.total_seconds().astype(int)
print (df)
                   col    new
0             03:59:49  14389
1  1904-01-01 04:06:08  14768
EDIT:
df['new'] = pd.to_timedelta(pd.to_datetime(df['col']).dt.strftime('%H:%M:%S')).dt.total_seconds().astype(int)
 
    
    
        jezrael
        
- 822,522
- 95
- 1,334
- 1,252
- 
                    When I pass the original Pandas Series from the DataFrame 'data' as data['duration'] = pd.to_timedelta(data['Finish_Netto'].str[-8:]).dt.total_seconds().astype(int) then it show error as "ValueError: expected hh:mm:ss format" – MAK Apr 17 '18 at 11:41
- 
                    @MAK - It seems there is some another format of data, so please check edited answer. – jezrael Apr 17 '18 at 11:47
- 
                    the data is a DataFrame and column type is object. Now it gives another error as: " TypeError:is not convertible to datetime" – MAK Apr 17 '18 at 11:53
0
            Using regex:
import pandas as pd
df = pd.DataFrame({"a": ["03:59:49", "04:59:49", "1904-01-01 05:59:49", "1904-01-01 06:59:49"]})
df["TotalSeconds"]  = pd.to_timedelta(df["a"].str.extract('(\d{2}:\d{2}:\d{2})')).dt.total_seconds()
print(df)
Output:
                     a  TotalSeconds
0             03:59:49       14389.0
1             04:59:49       17989.0
2  1904-01-01 05:59:49       21589.0
3  1904-01-01 06:59:49       25189.0
 
    
    
        Rakesh
        
- 81,458
- 17
- 76
- 113
- 
                    Thanks Rakesh, it works. But there are some NaN values, how to delete or convert it to 0. – MAK Apr 17 '18 at 11:38
- 
                    
- 
                    Ex: `pd.to_timedelta(df["a"].str.extract('(\d{2}:\d{2}:\d{2})')).dt.total_seconds().fillna(0)` – Rakesh Apr 17 '18 at 11:46

