When an input matches two lexer rules, ANTLR chooses either the longest or the first, see disambiguate. With your grammar, in will be interpreted as UNIT, never CONVERT, and the rule
conversion: NUMBER UNIT CONVERT UNIT;
can't work because there are three UNIT tokens :
$ grun Question question -tokens -diagnostics input.txt
[@0,0:0='1',<NUMBER>,1:0]
[@1,1:1=' ',<WS>,channel=1,1:1]
[@2,2:3='in',<UNIT>,1:2]
[@3,4:4=' ',<WS>,channel=1,1:4]
[@4,5:6='in',<UNIT>,1:5]
[@5,7:7=' ',<WS>,channel=1,1:7]
[@6,8:13='meters',<UNIT>,1:8]
[@7,14:14='\n',<NL>,1:14]
[@8,15:14='<EOF>',<EOF>,2:0]
Question last update 0159
line 1:5 missing 'in' at 'in'
line 1:8 mismatched input 'meters' expecting <EOF>
What you can do is to have only ID or TEXT tokens and distinguish them with a label, like this :
grammar Question;
question
@init {System.out.println("Question last update 0132");}
: conversion NL EOF
;
conversion
: NUMBER unit1=ID convert=ID unit2=ID
{System.out.println("Quantity " + $NUMBER.text + " " + $unit1.text +
" to convert " + $convert.text + " " + $unit2.text);}
;
ID : LETTER ( LETTER | DIGIT | '_' )* ; // or TEXT : LETTER+ ;
NUMBER : DIGIT+ ;
NL : [\r\n] ;
WS : [ \t] -> channel(HIDDEN) ; // -> skip ;
fragment LETTER : [a-zA-Z] ;
fragment DIGIT : [0-9] ;
Execution :
$ grun Question question -tokens -diagnostics input.txt
[@0,0:0='1',<NUMBER>,1:0]
[@1,1:1=' ',<WS>,channel=1,1:1]
[@2,2:3='in',<ID>,1:2]
[@3,4:4=' ',<WS>,channel=1,1:4]
[@4,5:6='in',<ID>,1:5]
[@5,7:7=' ',<WS>,channel=1,1:7]
[@6,8:13='meters',<ID>,1:8]
[@7,14:14='\n',<NL>,1:14]
[@8,15:14='<EOF>',<EOF>,2:0]
Question last update 0132
Quantity 1 in to convert in meters
Labels are available from the rule's context in the visitor, so it is easy to distinguish tokens of the same type.