My first post:
Before beginning, I should note I am relatively new to OOP, though I have done DB/stat work in SAS, R, etc., so my question may not be well posed: please let me know if I need to clarify anything.
My question:
I am attempting to import and parse large CSV files (~6MM rows and larger likely to come). The two limitations that I've run into repeatedly have been runtime and memory (32-bit implementation of Python). Below is a simplified version of my neophyte (nth) attempt at importing and parsing in reasonable time. How can I speed up this process? I am splitting the file as I import and performing interim summaries due to memory limitations and using pandas for the summarization:
Parsing and Summarization:
def ParseInts(inString):
    try:
        return int(inString)
    except:
        return None
def TextToYearMo(inString):
    try:
        return 100*inString[0:4]+int(inString[5:7])
    except:
        return 100*inString[0:4]+int(inString[5:6])
def ParseAllElements(elmValue,elmPos):
    if elmPos in [0,2,5]:
        return elmValue
    elif elmPos == 3:
        return TextToYearMo(elmValue)
    else:
        if elmPos == 18:
            return ParseInts(elmValue.strip('\n'))
        else:
            return ParseInts(elmValue)
def MakeAndSumList(inList):
    df = pd.DataFrame(inList, columns = ['x1','x2','x3','x4','x5',
                                         'x6','x7','x8','x9','x10',
                                         'x11','x12','x13','x14'])
    return df[['x1','x2','x3','x4','x5',
               'x6','x7','x8','x9','x10',
               'x11','x12','x13','x14']].groupby(
               ['x1','x2','x3','x4','x5']).sum().reset_index()
Function Calls:
def ParsedSummary(longString,delimtr,rowNum):
    keepColumns = [0,3,2,5,10,9,11,12,13,14,15,16,17,18]
    #Do some other stuff that takes very little time
    return [pse.ParseAllElements(longString.split(delimtr)[i],i) for i in keepColumns]
def CSVToList(fileName, delimtr=','):
    with open(fileName) as f:
        enumFile = enumerate(f)
        listEnumFile = set(enumFile)
        for lineCount, l in enumFile:
            pass
        maxSplit = math.floor(lineCount / 10) + 1
        counter = 0
        Summary = pd.DataFrame({}, columns = ['x1','x2','x3','x4','x5',
                                              'x6','x7','x8','x9','x10',
                                              'x11','x12','x13','x14'])
        for counter in range(0,10):
            startRow     = int(counter * maxSplit)
            endRow       = int((counter + 1) * maxSplit)
            includedRows = set(range(startRow,endRow))
            listOfRows = [ParsedSummary(row,delimtr,rownum) 
                            for rownum, row in listEnumFile if rownum in includedRows]
            Summary = pd.concat([Summary,pse.MakeAndSumList(listOfRows)])
            listOfRows = []
            counter += 1
    return Summary
(Again, this is my first question - so I apologize if I simplified too much or, more likely, too little, but I am at a loss as to how to expedite this.)
For runtime comparison:
Using Access I can import, parse, summarize, and merge several files in this size-range in <5 mins (though I am right at its 2GB lim). I'd hope I can get comparable results in Python - presently I'm estimating ~30 min run time for one file. Note: I threw something together in Access' miserable environment only because I didn't have admin rights readily available to install anything else.
Edit: Updated parsing code. Was able to shave off five minutes (est. runtime at 25m) by changing some conditional logic to try/except. Also - runtime estimate doesn't include pandas portion - I'd forgotten I'd commented that out while testing, but its impact seems negligible.
 
     
    