I'm trying to parse string in the following format (EBNF, I hope this is right) in PHP:
<exp>      ::= <base>[{<modifier>["!"]"("<exp>")"}]
<base>     ::= <role>[{<modifier><role>}]
<modifier> ::= "&" | "|"
<role>     ::= ["!"]<str>[","<str>]
Where <str> is any string that would pass [a-zA-Z0-9\-]+
The following are example of patterns that would have to be parsed:
token1
token1&token2
token1|(token2&!token3)
(token1&token2)|(token3&(token4|(!token5,12&token6)))
!(token1&token2|(token3&!token4))|token5,12
I am trying to write a RegEx pattern that would always give me four groups:
- The left-most <expression>. From the above example this would be:- token1
- token1
- token1
- token1&token2
- token1&token2|(token3&!token4)
 
- If ["!"]was present. I.e.- null
- null
- null
- null
- !
 
- The <modifier>for the next<expression>(if any). This would be:- null
- &
- |
- |
- |
 
- The remaining of the pattern.
- null
- token2
- token2&!token3
- token3&(token4|(!token5,12&token6))
- token5,12
 
I can parse this provided that the first expression doesn't contain any <modifier>s. 
^\(?(!?)([a-zA-Z0-9\-]+)\)?([&|]?)(.*)$
I am stuck at this point. I have tried using lookarounds, however I can't figure out how to ensure that the group is captured when all brackets are balanced. Is this achievable with RegEx or do I need to write code using loops etc. to do this?
 
    