I’m trying to parse C++ code. Therefore, I need a context-sensitive lexer. In C++, >> is either one or two tokens (>> or > >), depending on the context. To make it even more complex, there is also a token >>= which is always the same regardless of the context.
punctuation :: Bool -> Parser Token
punctuation expectDoubleGT = do
c <- oneOf "{}[]#()<>%;:.+-*/^&|~!=,"
case c of
'>' ->
(char '=' >> return TokGTEq) <|>
if expectDoubleGT
then (string ">=" >> return TokRShiftEq) <|> return TokGT
else (char '>' >> ((char '=' >> return TokRShiftEq) <|> return TokRShift)) <|> return TokGT
When expectDoubleGT is False, this function works fine. However, when expectDoubleGT is True (the second last line above), it gives an error when the input is >>.
*Parse> parseTest (punctuation True) ">"
TokGT
*Parse> parseTest (punctuation True) ">>="
TokRShiftEq
*Parse> parseTest (punctuation True) ">>"
parse error at (line 1, column 2):
unexpected end of input
expecting ">="
Why does the expression (string ">=" >> return TokRShiftEq) <|> return TokGT raise an error rather than returning TokGT when the input is >? (the first > was already consumed)