I am trying to iterate through a number of PDF files within a folder on my desktop. My goal is to read the text from each of these PDFs (they are all only one page long) and place each distinct PDF's text into a new row within one dataframe.
I have tried looping through the folder, and it has worked in terms of providing me with text outputs from all the PDFs I have in that folder (I have created a folder with two "test" PDFs to see if the code works), but it fails to concatenate the text into one single dataframe. I would like for the output of my code to create a single dataframe with new rows containing each PDF's text so that I can export it to a csv afterward. The output I am getting is instead two separate dataframes that, once I export to a csv, do not transfer their text into the csv file. In fact, the code I have written I believe overwrites every dataframe except for the last one created, thus producing only one object called "df". Any help would be greatly appreciated, hope this query was clear enough, I have seen related threads but have not been able to find one that solves this exact issue.
rootdir = 'directory file path'
for subdir, dirs, files in os.walk(rootdir):
        for file in files:
            doc = fitz.open(file)
            page = doc[0]
            text = page.getText("text")
            text_list = []                    #create list to store text in
            text_list.append(text)            # append the text to the list
            df = pd.DataFrame(text_list)      #create a df from the list
            df.columns = ['text']
            doc.close()
            print(df)
Output is below:
         text
0  Dummy PDF file\n
                                                text
0   \n \n \n \n \n \nThis is a test PDF document....
 
     
    