To emphasize, I do not want to "parse using a regex" - I want to "parse a regex into a symbolic tree." (Searching has only brought up the former...)
My use case: to speed up a regex search over a database, I'd like to parse a regex like (foo|bar)baz+(bat)* and pull out all substrings that MUST appear in a match. (In this case, it's just baz because foo/bar are alternations and bat can appear 0 times.)
To do this, I need some understanding of regex operators/semantics. re.DEBUG comes closest:
In [7]: re.compile('(foo|bar)baz+(bat)', re.DEBUG)
subpattern 1
  branch
    literal 102
    literal 111
    literal 111
  or
    literal 98
    literal 97
    literal 114
literal 98
literal 97
max_repeat 1 4294967295
  literal 122
subpattern 2
  literal 98
  literal 97
  literal 116
However, it's just printing out, and the c-implementation doesn't preserve the structure afterwards as far as I can tell. Any ideas on how I can parse this out without writing my owner parser?
 
     
     
     
    