I am working on parsing pseudo-S-expressions with recursive regexes in Ruby.
After doing some searching, I started using the regular expressions used in the answer to "Matching balanced parenthesis in Ruby using recursive regular expressions like perl". The regex matches correctly, but the results are exhibiting strange behavior. If I try to use match on any of the results, those further results will match the entire tested string, no matter what the regex is used. If I explicitly override one of the initial results with a string literal, then match works as expected for that result. However, the class of the result entry undoubtedly claims that it is a plain vanilla string. What on earth is going on here?
src = "(def foo 10) (+ foo 4 12)"
def parse(exp)
expression =%r{
(?<re>
\(
(?:
(?> [^()]+ )
|
\g<re>
)*
\)
)
}x
trans = ""
exp.scan(expression) {|m|
m[0].match(/\d/) {|m|
trans += m.string
}
}
return trans
end
Of course, this isn't even close to complete parsing code. I also know it's not a great idea to try to parse code robustly with regexes, but I'm not trying to make a robust solution, just a POC.
Does anyone know what's causing these regexes to misbehave?