I know this thread has already been answered, but I actually have a similar problem (relating to finding strings that "consume query"). I'm trying to sum up all of the integers preceding a character like 'S', 'M', 'I', '=', 'X', 'H', as to find the read length via a paired-end read's CIGAR string. 
I wrote a Python script that takes in the column $6 from a SAM/BAM file: 
import sys                      # getting standard input
import re                       # regular expression module
lines = sys.stdin.readlines()   # gets all CIGAR strings for each paired-end read
total = 0
read_id = 1                     # complements id from filter_1.txt
# Get an int array of all the ints matching the pattern 101M, 1S, 70X, etc.
# Example inputs and outputs: 
# "49M1S" produces total=50
# "10M757N40M" produces total=50
for line in lines:
    all_ints = map(int, re.findall(r'(\d+)[SMI=XH]', line))
    for n in all_ints:
        total += n
    print(str(read_id)+ ' ' + str(total))
    read_id += 1
    total = 0
The purpose of the read_id is to mark each read you're going through as "unique", in case if you want to take the read_lengths and print them beside awk-ed columns from a BAM file.
I hope this helps, or at least helps the next user that has a similar issue. 
I consulted https://stackoverflow.com/a/11339230 for reference.