Team,
I have a pdf file about 6000+ pages. what's the fastest method I can use to extract the texts?
I am using this code
all_text = ""
with pdfplumber.open(pdf_dir) as pdf:
            for page in pdf.pages:
                text = page.extract_text()
                all_text += text
but it's taking a lot of time to complete
also after extracting I would then need to search for the address which I am using this code:
address_line = re.compile(r'(:  \d{5})')
for line in text.split('\n'):
    if address_line.search(line):
        print(line)
appreciate your help in advance :)
 
     
    