I am reading data from SQL Server to S3 as a parquet file. In SQL Server, my data type is date and the format is 2022-09-01 like a date should be.
When I read the parquet file using pandas with the code below:
df=pd.read_parquet(r"path\to\file.parquet", engine='fastparquet')
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 500)
print(df)
It automatically converts the date datatype from the source, to datetime64[ns] in the target parquet file. I don’t know why it does this. The format of the column looks the same as the source, 2022-09-01 but the data type is datetime.
For other columns the source data type was datetime and it converted to datetime, for this one it was date and converted to datetime.
How can I stop this?
I don’t know what to tell the team that does quality assurance checks, they keep bugging me asking me why. I don’t know because that’s just how parquet reader does it?