I am trying to automate the process of extracting data from my bank statements and introduce them into excel. I am aware that there are bank API's that facilitate getting this data directly but for this use case, I cannot use them.
I am using python3 and the PyPDF2 library. (I am using python3 because it the language I am most comfortable with, but I'm willing to try anything if there is one more suitable for my use case)
The code I have up to now:
# Importing required modules
import PyPDF2
def pdf_to_excel():
    # Creating a pdf file object
    pdfFileObj = open('Horses Jan 2020.pdf', 'rb')
    
    # Creating a pdf reader object
    pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
    
    # Printing number of pages in pdf file
    print(pdfReader.numPages)
    # Printing Document Layout
    print(pdfReader.pageLayout)
    
    # Creating a page object
    pageObj = pdfReader.getPage(0)
    # Extract movement details into a dict
    movments = extract_movement_details(pageObj)
    
    # Closing the pdf file object
    pdfFileObj.close()
    
    return movments
def extract_movement_details(pageObj):
    
    # Extracting text from page
    pdftext = pageObj.extractText()
    # Split pdf text
    text_split = pdftext.split()
    """ Extract General Information from Bank Statment """
    # Extract General Information
    client_n = text_split[10].replace('Cliente', '').split('R', 1)
    general_info = {
        'From': text_split[1],
        'To': text_split[3].replace('Fecha', ''),
        'Account No.': text_split[8].replace('Cuenta', '').replace('No.', ''),
        'Client No.': client_n[0],
        'R.F.C': client_n[1].replace('.F.C', '').replace('No.', ''),
        'No. Key Account': text_split[12].replace('CLABE', '').replace('PAGINA', '')
        }
    
    # Extract movments information
    movement_details = {
        'Date Opt': 0,
        'Date Liq': 0,
        'Description': 0,
        'Reference': 0,
        'Debit': 0,
        'Credit': 0,
        'Operation': 0,
        'Liquidation': 0
        }
    print(general_info)
    print(movement_details)
    print("\n")
    
    # Printing the text from the page
    print(text_split)
    
    movments = text_split
    return movments
pdf_to_excel()
It uses PyPDF2 and consists of two functions, one that opens the file and one that splits the text and extracts the data I need. Basically, I am trying to recognize patterns and extract the data in accordance with those patterns. However, it is proving to be much harder than I thought it would be because the data does not seem to be organized in a logical way on the list I got. Words are stuck together, they are not in the same order as they are visually on excel, and recognizing the patterns is proving quite difficult.
Unfortunately, since it's banking information, I am not sure how I can display it here without giving away sensitive information but I will work on that and try to post the text extracted and the way the docs look. For now here is a picture with the sensitive information redacted, 
Moreover, before trying with PyPDF2 I tried with tabula but since the movements are in a table with invisible borders it did not recognize them as tables.
Also, I will have to do this with many more banks in the future so if what I want can be done with a technology with a larger learning curve I am okay with that.
