I have a file like this, containing sentences, marked as BOS (Begin Of Sentence) and EOS (End Of Sentence):
BOS 1
1 word \t\t word \t word \t\t word \t 123
1 word \t\t word \t word \t\t word \t 234
1 word \t\t word \t word \t\t word \t 567
EOS 1
BOS 2
2 word \t\t word \t word \t\t word \t 456
2 word \t\t word \t word \t\t word \t 789
EOS 2
And a second file, where the first number shows the sentence number:
1, 123, 567
2, 789
What I want is to read the first and the second file and check if the numbers at the end of every line occur in the second file. If so, I want to change only the fourth word in the line of the first file. So, the expected output is:
BOS 1
1 word \t\t word \t word \t\t NEW_WORD \t 123
1 word \t\t word \t word \t\t word \t 234
1 word \t\t word \t word \t\t NEW_WORD \t 567
EOS 1
BOS 2
2 word \t\t word \t word \t\t word \t 456
2 word \t\t word \t word \t\t NEW_WORD \t 789
EOS 2
First of all, I'm not sure how to read the two files, because they have a different number of lines. Then, I don't know how to iterate over the lines e.g. of the first sentence in the first file and at the same time iterate over the values in first line of the second file to compare. This is what I have so far:
def readText(filename1, filename2):
  data1 = open(filename1).readlines()   # the first file
  data2 = open(filename2).readlines() # the second one
  list2 = [] # a list to store the values of the second file
  for line1, line2 in itertools.izip(data1, data2):
    l1 = line1.split()
    l2 = line2.split(', ')
    find = re.findall(r'.*word\t\d\d\d', line1) # find the fourth word in a line, followed by a number
    for l in l2:
      list2.append(l)
    for match in find:
      m = match.split() # split the lines of the first file
      if (m[0] == list2[0]): # for the same sentence number in the two files 
        result = re.sub(r'(.*)word\t%s' %m[5], r'\1NEW_WORD\t%s' %m[5],line1) 
if len(sys.argv)==3: 
  lines = readText(sys.argv[1], sys.argv[2])
else:
  print("file.py inputfile1 inputfile2")
Thanks in advance for any help!
 
     
    