Trying to do practice on using large data on AWS using mapreduce and python.
I have the code
    import sys
    import re
    import csv
    import glob
    import string
    #class MyDialect(csv.Dialect):
        #strict = True
        #skipinitialspace = False
        #quoting = QUOTE_MINIMAL
        #delimiter = ','
        #quotechar = '"'
    for line in sys.stdin:
        csv.reader(line, dialect='excel')
        #reader = csv.reader(line, delimiter=',', quoting=csv.QUOTE_ALL,  quotechar='"')
        #line = line.strip()
        #unpacked = line.split(",")
        try:
        #regular expresion 
          num,title,year,length,budget,rating,votes,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,mpaa,Action,Animation,Comedy,Drama,Documentary,Romance,Short = line.split(",")
          if float(rating) <= 1:
            results = [votes, rating, title, year]
            print("\t".join(results))
        except ValueError:
          pass
Now I know this isn't perfect its outputing the line value, however whenever I try to use the csv on the line I get .
<_csv.reader object at 0x7fc2c184e280>
for all my lines.
I need to get the input as a line, and the output it to std out as this is one node processing the data and passing it to the reducer. I have most of the bugs worked out, however it doesn't accept titles with a comma in them. so "Blair witch, the" would be skipped and not shown in the list as I believe the budget becomes the rating and the rating the votes.
Any idea on how to do this?
 
    