I tried all the solutions here: Pandas "Can only compare identically-labeled DataFrame objects" error
Didn't work for me. Here's what I've got. I have two data frames. One is a set of financial data that already exists in the system and another set that has some that may or may not exist in the system. I need to find the difference and add the stuff that doesn't exist.
Here is the code:
import pandas as pd
import numpy as np
from azure.storage.blob import AppendBlobService, PublicAccess, ContentSettings
from io import StringIO
dataUrl = "http://ichart.finance.yahoo.com/table.csv?s=MSFT"
blobUrlBase = "https://pyjobs.blob.core.windows.net/"
data = pd.read_csv(dataUrl)
abs = AppendBlobService(account_name='pyjobs', account_key='***')
abs.create_container("stocks", public_access = PublicAccess.Container)
abs.append_blob_from_text('stocks', 'msft', data[:25].to_csv(index=False))
existing = pd.read_csv(StringIO(abs.get_blob_to_text('stocks', 'msft').content))
ne = (data != existing).any(1)
the failing code is the final line. I was going through an article on determining differences between data frames.
I checked the dtypes on all columns, they appear to be the same. I also did a side by side output, I sorted teh axis, the indices, dropped the indices etc. Still get that bloody error.
Here is the output of the first row of existing and data
>>> existing[:1]
         Date       Open   High    Low  Close    Volume  Adj Close
0  2016-05-27  51.919998  52.32  51.77  52.32  17653700      52.32
>>> data[:1]
         Date       Open   High    Low  Close    Volume  Adj Close
0  2016-05-27  51.919998  52.32  51.77  52.32  17653700      52.32
Here is the exact error I receive:
>>> ne = (data != existing).any(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Anaconda3\lib\site-packages\pandas\core\ops.py", line 1169, in f
    return self._compare_frame(other, func, str_rep)
  File "C:\Anaconda3\lib\site-packages\pandas\core\frame.py", line 3571, in _compare_frame
    raise ValueError('Can only compare identically-labeled '
ValueError: Can only compare identically-labeled DataFrame objects
 
     
     
     
    