I am trying to compare a Table in Snowflake database with it's exported CSV file, which fails due to numeric columns in CSV exponential and datetime with milliseconds
Converted both the table read from snowflake and CSV to data frame and comparing.
sel_query = "SELECT * FROM X_TABLE"
cur.execute(sel_query)
# making dataframe from ouput of sel_query
df1 = pd.DataFrame.from_records(iter(cur), columns=[x[0] for x in cur.description])
# making datafrmae from csv file
df2 = pd.read_csv("csv_file.csv", encoding="utf-8", sep=',')
dfBool = (df1 != df2).stack()
dfdiff = pd.concat([df1.stack()[dfBool], df2.stack()[dfBool]], axis=1)
dfdiff.columns = ["Snowflake data", "CSV data"]
print(dfdiff)
df1.equals(df2) should return True and dfdiff should be empty
Instead the ouput is as below for dfdiff:
                                Snowflake data               CSV data
0   TENURE                               452                      452
    NUMBER                             40379813              4.03798e+07
    TIME                    2018-04-09 11:34:49      2018-04-09 11:34:49.000
1   TENURE                               364                      364
    NUMBER                             40377580              4.03776e+07
    TIME                    2018-04-09 09:11:29      2018-04-09 09:11:29.000