I am trying to read a deliminated text file into a dataframe in python. The deliminator is not being identified when I use pd.read_table. If I explicitly set sep = ' ', I get an error: Error tokenizing data. C error. Notably the defaults work when I use np.loadtxt().
Example:
pd.read_table('http://berkeleyearth.lbl.gov/auto/Global/Land_and_Ocean_complete.txt',
              comment = '%',
              header = None)
    0
0   1850 1 -0.777 0.412 NaN NaN...
1   1850 2 -0.239 0.458 NaN NaN...
2   1850 3 -0.426 0.447 NaN NaN...
3   1850 4 -0.680 0.367 NaN NaN...
4   1850 5 -0.687 0.298 NaN NaN...
If I set sep = ' ', I get another error:
pd.read_table('http://berkeleyearth.lbl.gov/auto/Global/Land_and_Ocean_complete.txt',
              comment = '%',
              header = None,
              sep = ' ')
ParserError: Error tokenizing data. C error: Expected 2 fields in line 78, saw 58
Looking up this error, people suggest using header = None (already done) and setting sep =  explicitly, but that is causing the problem: Python Pandas Error tokenizing data. I looked up line 78 and can't see any problems. If I set error_bad_lines=False i get an empty df suggesting there is a problem with every entry.
Notably this works when I use np.loadtxt():
pd.DataFrame(np.loadtxt('http://berkeleyearth.lbl.gov/auto/Global/Land_and_Ocean_complete.txt', 
                        comments = '%'))
    0   1   2   3   4   5   6   7   8   9   10  11
0   1850.0  1.0     -0.777  0.412   NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
1   1850.0  2.0     -0.239  0.458   NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
2   1850.0  3.0     -0.426  0.447   NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
3   1850.0  4.0     -0.680  0.367   NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
4   1850.0  5.0     -0.687  0.298   NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
This suggests to me that there isn't something wrong with the file, but rather with how I am calling pd.read_table(). I looked through the documentation for np.loadtxt() in the hope of setting the sep to the same value, but that just shows: delimiter=None (https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html).
I'd prefer to be able to import this as a pd.DataFrame, setting the names, rather than having to import as a matrix and then convert to pd.DataFrame.
What am I getting wrong?
 
     
    