Why won't Parsec consider the right-hand side of my <|> alternative?

Question

I’m trying to parse C++ code. Therefore, I need a context-sensitive lexer. In C++, >> is either one or two tokens (>> or > >), depending on the context. To make it even more complex, there is also a token >>= which is always the same regardless of the context.

punctuation :: Bool -> Parser Token
punctuation expectDoubleGT = do
    c <- oneOf "{}[]#()<>%;:.+-*/^&|~!=,"
    case c of
        '>' ->
            (char '=' >> return TokGTEq) <|>
            if expectDoubleGT
                then (string ">=" >> return TokRShiftEq) <|> return TokGT
                else (char '>' >> ((char '=' >> return TokRShiftEq) <|> return TokRShift)) <|> return TokGT

When expectDoubleGT is False, this function works fine. However, when expectDoubleGT is True (the second last line above), it gives an error when the input is >>.

*Parse> parseTest (punctuation True) ">"
TokGT
*Parse> parseTest (punctuation True) ">>="
TokRShiftEq
*Parse> parseTest (punctuation True) ">>"
parse error at (line 1, column 2):
unexpected end of input
expecting ">="

Why does the expression (string ">=" >> return TokRShiftEq) <|> return TokGT raise an error rather than returning TokGT when the input is >? (the first > was already consumed)

score 11 · Accepted Answer · answered Dec 09 '12 at 14:34

Parsec only tries the second parser in

p1 <|> p2

if p1 failed without consuming any input. On The input ">>", after the first '>' has been consumed,

string ">="

fails after consuming the left over '>', so the second parser isn't used.

You need a try

try (string ">=" >> return TokRShiftEq)

there so that if string ">=" fails, no input is consumed and the alternative parser is used.

score -1 · Answer 2 · answered Mar 04 '15 at 20:47

-1

Use libclang. It can parse all of C++. No matter how hard you try, you won't be able to.

answered Mar 04 '15 at 20:47

Demi

3,535
5
29
45

While this isn't a good answer to the question, it is a useful comment. Parsing C and C++ means you [should locally accept ambiguity](http://stackoverflow.com/questions/4172342/complexity-of-parsing-c), and I'm not sure whether Parsec can do that. – Jun 20 '16 at 12:50

Why won't Parsec consider the right-hand side of my <|> alternative?

2 Answers2