I have found variants of solutions to this question on stack but still cant seem to figure out why my use case does not work.
I have a large number of .LAS files that I want to concatenate. These are column based text files with a similar format.
Text Example:
# Produced by RMS 12
~VERSION
VERS        .          2                       :CWLS Log Ascii Standard - Version 2
WRAP        .          NO                      :One line per depth step
DLM         .          SPACE                   :Delimiting character (SPACE, TAB or COMMA)
~WELL
# Keyword.Unit         Data Type               Information
# ------------         ---------               -------------------------------
WELL        .          C-34B                   :WELL
STRT        .U.S. ft   53.8099                 :First index value
STOP        .U.S. ft   16129.8099              :Last index value
STEP        .U.S. ft   0.5000                  :Step of index
NULL        .          -999.25                 :NoValue
FLD         .                                  :FIELD
COMP        .                                  :COMPANY
CTRY        .                                  :COUNTRY
DATE        .                                  :DATE DD/MM/YYYY
SRVC        .                                  :SERVICE COMPANY
LOC         .                                  :LOCATION
PROV        .                                  :PROVINCE
UWI         .                                  :UNIQUE WELL ID
X           .U.S. ft   664504.000              :X-coordinate of Well Head
Y           .U.S. ft   5959656.990             :Y-coordinate of Well Head
RKB         .U.S. ft   83.3200                 :RKB
WATER_DEPTH .U.S. ft   0.0000                  :Seabed or ground level
~PARAMETER
# Keyword.Unit         Value                   Description
# ------------         ---------               -------------------------------
~CURVE
# Name.Unit                                    Curve Description
# ---------                                    -----------------
MD          .U.S. ft                           :1 Index
ZoneLog_2018_JAO.                                  :2
~ASCII MD             ZoneLog_2018_JAO
53.8099        -999.25        
54.3099        -999.25        
54.8099        -999.25        
55.3099        -999.25        
55.8099        -999.25        
I would like to add the file name ("01-01" in this case) to the first column and then concatentate the data below the ~ASCII MD row.
I.e. the data would look like this if I had more then one file:
Well Name   MD  _ZoneLog_JAO
01-01A  5000    9
01-01A  5001    9
01-01A  5002    10
01-01B  4999    9
01-01B  5000    9
01-01B  5001    10
01-01C  4856    8
01-01C  4857    9
01-01C  4859    10
I have attempted the script below to accomplish this (Import multiple csv files into pandas and concatenate into one DataFrame):
import pandas as pd
import os
import glob
merged_files = glob.glob("*.las") #creates a list of all las files
data = [] # pd.concat takes a list of dataframes as an agrument
for LAS in merged_files:
    frame = pd.read_csv(LAS, skiprows=37)
    frame['filename'] = os.path.splitext(LAS) [0]
    data.append(frame)
merge = pd.concat(data, ignore_index=True) #dont want pandas to try an align row indexes
merge.to_csv("Merged_FFM15_ZoneLogs.txt", index=False)
print(merge.columns)
print(merge.shape)
This creates an output file called "Merged_FFM15_ZoneLogs.txt" that that kind of gets me there but the formatting looks to be a little odd.
Index([u'-0.0000        -999.25        ', u'0.0000         -999.25        ',
       u'filename', u'~ASCII MD             ZoneLog_2018_JAO'],
      dtype='object')
(81456352, 4)
I am not sure why the index ends up like this. I think it might have to do with one spurious file but since I am looking at like a 1000 I am not sure how to track it down? Maybe this has to do with the irregular delimiter? I am bit at a loss.
It also only seems to work with a small number of the files instead of the 1000 plus that I need to combine. One solution I found to the file size is to run it on 64bit python instead of the 32 bit option. Is this the only way to do this?
In a perfect world I would be able to only keep certain columns in the final file and then also be able to filter certain rows out (i.e. rows that have -999.25 in _ZONE_JAO column). In the varients of this code I have tried I ran into "memory errors" and TypeError: cannot concatenate object of type '<class 'pandas.io.parsers.TextFileReader'>'; only Series and DataFrame objs are valid errors.
