I'm trying to clean a pdf to turn it into a file for geocoding. I've been using tabula-py to rip the pdf and have had pretty good results up until the point of removing rows that are empty entirely. I'm not even sure if this is an efficient way to do this.
I've tried the majority of solutions SO has recommended to me and I still can't quite figure it out. I've set inplace = True, axis = 0 and 1, how="all". Tried indexing out the NaN values and that didn't work either.
import pandas as pd
import tabula
pd.set_option('display.width', 500)
df = tabula.read_pdf("C:\\Users\\Jack\\Documents\\Schoolwork\\Schoolwork\\WICResearch\\RefDocs\\wicclinicdirectory.pdf", pages='all', guess = False, pandas_options={'header': None})
df.columns = ["County_Name", "Clinic_Number", "Clinic_Name", "Address", "City", "Zip_Code", "Phone_Number", "Hours_of_Operation"]
df.drop(["Phone_Number", "Hours_of_Operation"], axis = 1, inplace = True)
#mass of code here that removes unwanted repeated column headers as by product of tabula reading PDFs.
df.drop(["Clinic_Name"], axis = 1, inplace = True)
    df[['ClinicNum','ClinicName']] = df.Clinic_Number.apply(lambda x: pd.Series(str(x).split(" ", maxsplit = 1)))
    df.drop(["Clinic_Number"], axis = 1, inplace = True)
    #df[~df.isin(['NaN', 'NaT']).any(axis=1)]
    #df.dropna(axis= 0, how ='all', inplace = True)
    NaNIndex = df.index[df.isnull().all(1)]
    print(NaNIndex)
    print(df)
The above code gives this output: 
Index: []
    County_Name                              Address                       City Zip_Code ClinicNum                                ClinicName
0           NaN                                  NaN  Ohio WIC Clinic Locations      NaN       nan                                       NaN
1           NaN                                  NaN                        NaN      NaN    Clinic                                       NaN
3         Adams                 9137 State Route 136                 West Union    45693       100                  Adams County WIC Program
4           NaN                                  NaN                        NaN      NaN       nan                                       NaN
5           NaN                                  NaN                        NaN      NaN       nan                                       NaN
6         Allen        940 North Cable Road, Suite 4                       Lima    45805       200                  Allen County WIC Program
7       Ashland                    934 Center Street                    Ashland    44805       300                Ashland County WIC Program
8           NaN                              Suite E                        NaN      NaN       nan                                       NaN
9     Ashtabula              Geneva Community Center                     Geneva    44041       403                         Geneva WIC Clinic
10          NaN                  62 West Main Street                        NaN      NaN       nan                                       NaN
11    Ashtabula    Jefferson United Methodist Church                  Jefferson    44047       402                      Jefferson WIC Clinic
12          NaN            125 East Jefferson Street                        NaN      NaN       nan                                       NaN
13    Ashtabula       Conneaut Human Resource Center                   Conneaut    44030       401                       Conneaut WIC Clinic
14          NaN                      327 Mill Street                        NaN      NaN       nan                                       NaN
15    Ashtabula                     3225 Lake Avenue                  Ashtabula    44004       400              Ashtabula County WIC Program
16          NaN                                  NaN                        NaN      NaN       nan                                       NaN
18          NaN                                  NaN                        NaN      NaN    Clinic                                       NaN
20    Ashtabula           St. Mary's Catholic Church                     Orwell    44076       490                         Orwell WIC Clinic
And what I'd like is:
Index: []
    County_Name                              Address                       City Zip_Code ClinicNum                                ClinicName
3         Adams                 9137 State Route 136                 West Union    45693       100                  Adams County WIC Program
6         Allen        940 North Cable Road, Suite 4                       Lima    45805       200                  Allen County WIC Program
7       Ashland                    934 Center Street                    Ashland    44805       300                Ashland County WIC Program
8           NaN                              Suite E                        NaN      NaN       nan                                       NaN
9     Ashtabula              Geneva Community Center                     Geneva    44041       403                         Geneva WIC Clinic
10          NaN                  62 West Main Street                        NaN      NaN       nan                                       NaN
11    Ashtabula    Jefferson United Methodist Church                  Jefferson    44047       402                      Jefferson WIC Clinic
12          NaN            125 East Jefferson Street                        NaN      NaN       nan                                       NaN
13    Ashtabula       Conneaut Human Resource Center                   Conneaut    44030       401                       Conneaut WIC Clinic
14          NaN                      327 Mill Street                        NaN      NaN       nan                                       NaN
15    Ashtabula                     3225 Lake Avenue                  Ashtabula    44004       400              Ashtabula County WIC Program
18          NaN                                  NaN                        NaN      NaN    Clinic                                       NaN
20    Ashtabula           St. Mary's Catholic Church                     Orwell    44076       490                         Orwell WIC Clinic
I am able to create the data frame I want with the correct headings but it still does not remove the NaN values. Or it removes the entire thing. I'd also like to be able to move the rows that are not all NaN values into the correlating ones so they are all one line.
I'm also not sure how reproducible I can get this as I have fiddled around with tabula quite a bit trying to get this pdf converted.
