I am trying to read a csv file and cast one of the columns as datetime. However, I do not know why the some data points i.e. 2019-01-03 12:00:00 aremissing the milliseconds, while the rest of the data contains milliseconds. This causes an error.
My question is two-fold:
- Since current code below generates an error, how do I get around this and parse the datetime column ?
- If I were to reproduce this csv file, how can I ensure all datetimes data have milliseconds ?
Sorry. Not sure why the code is not displaying properly here.
custom_date_parser = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f')
df = pd.read_csv('abc.csv',parse_dates=['endTime'],date_parser=custom_date_parser)
    endTime
0   2019-01-02 09:40:22.668
1   2019-01-02 09:48:09.040
2   2019-01-02 09:54:54.209
3   2019-01-02 09:59:28.768
4   2019-01-02 10:06:33.820
5   2019-01-02 10:17:38.818
6   2019-01-02 10:30:26.999
7   2019-01-02 10:43:54.516
8   2019-01-02 11:04:26.652
9   2019-01-02 11:30:22.316
10  2019-01-02 11:59:59.751
11  2019-01-03 09:37:11.223
12  2019-01-03 09:49:06.226
13  2019-01-03 10:01:58.397
14  2019-01-03 10:15:20.918
15  2019-01-03 10:31:28.438
16  2019-01-03 10:52:26.130
17  2019-01-03 11:07:09.128
18  2019-01-03 11:22:00.907
19  2019-01-03 11:45:55.349
20  2019-01-03 12:00:00
21  2019-01-04 09:39:48.753
22  2019-01-04 09:48:06.856
23  2019-01-04 09:58:44.608
24  2019-01-04 10:10:49.498
25  2019-01-04 10:26:29.543
26  2019-01-04 10:39:36.750
27  2019-01-04 10:49:59.504
28  2019-01-04 11:00:02.138
29  2019-01-04 11:11:20.630
30  2019-01-04 11:27:59.402
31  2019-01-04 11:52:12.061
32  2019-01-04 11:59:59.879
33  2019-01-07 09:36:06.436
34  2019-01-07 09:44:07.126
35  2019-01-07 09:54:28.718
36  2019-01-07 10:05:54.130
37  2019-01-07 10:19:45.046
38  2019-01-07 10:38:15.991
39  2019-01-07 11:01:45.755
40  2019-01-07 11:17:39.586
41  2019-01-07 11:45:39.668
42  2019-01-07 12:00:00
The error msg is below:
  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 3298, in converter
    date_parser(*date_cols), errors="ignore", cache=cache_dates
  File "<ipython-input-1-26516a4dc77b>", line 34, in <lambda>
    custom_date_parser                      = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f')
TypeError: strptime() argument 1 must be str, not numpy.ndarray
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 3309, in converter
    dayfirst=dayfirst,
  File "pandas\_libs\tslibs\parsing.pyx", line 589, in pandas._libs.tslibs.parsing.try_parse_dates
  File "<ipython-input-1-26516a4dc77b>", line 34, in <lambda>
    custom_date_parser                      = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f')
  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\_strptime.py", line 577, in _strptime_datetime
  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\_strptime.py", line 359, in _strptime
ValueError: time data '2019-01-03 12:00:00' does not match format '%Y-%m-%d %H:%M:%S.%f'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "<ipython-input-2-9b9600d4b508>", line 1, in <module>
    df_bars = pd.read_csv(f'C:\\Users\\someone\\Desktop\\CV\\2021\\data\\abc.csv',parse_dates=['endTime'],date_parser=custom_date_parser)
  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 610, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 468, in _read
    return parser.read(nrows)
  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 1057, in read
    index, columns, col_dict = self._engine.read(nrows)
  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 2113, in read
    names, data = self._do_date_conversions(names, data)
  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 1846, in _do_date_conversions
    keep_date_col=self.keep_date_col,
  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 3352, in _process_date_conversion
    data_dict[colspec] = converter(data_dict[colspec])
  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\parsers.py", line 3314, in converter
    return generic_parser(date_parser, *date_cols)
  File "C:\Users\someone\AppData\Local\Programs\Spyder\pkgs\pandas\io\date_converters.py", line 100, in generic_parser
    results[i] = parse_func(*args)
  File "<ipython-input-1-26516a4dc77b>", line 34, in <lambda>
    custom_date_parser                      = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f')
  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\_strptime.py", line 577, in _strptime_datetime
  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\_strptime.py", line 359, in _strptime
ValueError: time data '2019-01-03 12:00:00' does not match format '%Y-%m-%d %H:%M:%S.%f'
 
     
     
    