I am currently trying to parse a text file in Python from the AER which shows the daily issued well licenses in Alberta. Basically I want to separate out the data for each license based on the type (well name, unique identifier, license number, etc.) shown in the file header, and add each of those to a list which can then be moved into a database.
The problem is the formatting on the text file in question (see below for a section of it) is not particularly friendly for parsing. There is no delimiter and it is meant to be human-readable. My experience with string manipulation is limited and I do not know how to go about solving this problem.
Here is a snippet of the text file in question:
    DATE: 02 July 2019                                                                                  
    --------------------------------------------------------------------------------------------        
    WELL NAME               LICENCE NUMBER         MINERAL RIGHTS       GROUND ELEVATION                
    UNIQUE IDENTIFIER       SURFACE CO-ORDINATES   BOARD FIELD CENTRE   PROJECTED DEPTH                 
    LAHEE CLASSIFICATION    FIELD                                       TERMINATING ZONE                
    DRILLING OPERATION      WELL PURPOSE           WELL  TYPE           SUBSTANCE                       
    LICENSEE                                                            SURFACE LOCATION                
    --------------------------------------------------------------------------------------------        
    MEG K7N HARDY 4-7-77-5               0483923   ALBERTA CROWN        571.7M                          
    106/04-07-077-05W4/02  S  572.4M  W  278.3M    BONNYVILLE           1600.0M                         
    DEV (NC)                             HARDY                          MCMURRAY FM                     
    HORIZONTAL                           RESUMPTIONPRODUCTION (SCHEME)  CRUDE BITUMEN                   
    MEG ENERGY CORP.                                                    09-07-077-05W4                  
    SPL 11-24 HZ MARTEN 14-25-76-6       0494994   ALBERTA CROWN        705.3M                          
    100/14-25-076-06W5/00  S  566.0M  E  800.6M    ST. ALBERT           2700.0M                         
    OUT (C)                              MARTEN                         CLEARWATER FM                   
    HORIZONTAL                           NEW       PRODUCTION           CRUDE OIL                       
    SPUR PETROLEUM LTD.                                                 11-24-076-06W5                  
    SPL 10-24 HZ MARTEN 5-23-76-6        0494995   ALBERTA CROWN        705.5M                          
    100/05-23-076-06W5/00  S  566.3M  W  800.1M    ST. ALBERT           2700.0M                         
    OUT (C)                              MARTEN                         CLEARWATER FM                   
    HORIZONTAL                           NEW       PRODUCTION           CRUDE OIL                       
    SPUR PETROLEUM LTD.                                                 10-24-076-06W5                  
    SURGE ENERGY HZ103 VALHALLA 6-7-75-8 0494996   ALBERTA CROWN        770.8M                          
    103/06-07-075-08W6/00  S  372.0M  E  324.5M    GRANDE PRAIRIE       3350.0M                         
    DEV (NC)                             VALHALLA                       DOIG FM                         
    HORIZONTAL                           NEW       PRODUCTION           CRUDE OIL                       
    SURGE ENERGY INC.                                                   13-06-075-08W6                  
    CNRL ET AL HZ KARR 4-16-66-3         0494997   ALBERTA CROWN        770.7M                          
    100/04-16-066-03W6/00  N  623.4M  E  127.5M    GRANDE PRAIRIE       5295.0M                         
    DEV (NC)                             KARR                           DUNVEGAN FM                     
    HORIZONTAL                           NEW       PRODUCTION           CRUDE OIL                       
    CANADIAN NATURAL RESOURCES LIMITED                                  05-14-066-03W6     
I do not need anything from the header info between the dotted lines, or the date. I need to extract only the text from each section of each line for each block, as laid out by the header. I have attempted some methods, including basic string manipulation in Python and RegEx, but none have come close and I am at a loss.. Let me know if you need more detail in explaining this task, I understand that this is a big ask and is a bit convoluted.
 
    