I have a lexer and parser, built with sedlex and menhir in OCaml, to parse spreadsheet formulas.
The following part of the lexer defines regular expressions for the path+workbook+worksheet part before a reference. For instance, 'C:\Users\Pictures\[Book1.xlsx]Sheet1'! of ='C:\Users\Pictures\[Book1.xlsx]Sheet1'!A1:B2.
let first_Latin_identifier_character = [%sedlex.regexp? ('a'..'z') | ('A'..'Z') ]
let path_identifier_character = [%sedlex.regexp? first_Latin_identifier_character | decimal_digit | '_' | '-' | ':' | '\x5C' (* \ *) | ' ' | '&' | '@']
let file_identifier_character = [%sedlex.regexp? first_Latin_identifier_character | decimal_digit | '_' | '-' | ' ' | '.']
let file_suffix = [%sedlex.regexp? ".xls" | ".xlsm" | ".xlsx" | ".XLS" | ".XLSM" | ".XLSX" | ".xlsb" | ".XLSB"]
let sheet_identifier_character_in_quote = [%sedlex.regexp? Compl ('\x3A' | '\x5C' | '\x2F' | '\x3F' | '\x2A' | '\x5B' | '\x5D' | '\x27')]
let sheet_identifier_character_out_quote = [%sedlex.regexp? Compl ('\x3A' | '\x5C' | '\x2F' | '\x3F' | '\x2A' | '\x27' | '\x28' | '\x29' | '\x2B' | '\x2D' | '\x2F' | '\x2C' |'\x3D' | '\x3E' | '\x3C' | '\x3b')]
let lex_file = [%sedlex.regexp? (Star path_identifier_character), '[', (Plus file_identifier_character), file_suffix, ']']
let lex_file_wo_brackets = [%sedlex.regexp? (Star path_identifier_character), (Plus file_identifier_character), file_suffix]
let lex_sheet_in_quote = [%sedlex.regexp? Plus sheet_identifier_character_in_quote]
let lex_file_sheet_in_quote = [%sedlex.regexp? lex_file, lex_sheet_in_quote]
let lex_before = [%sedlex.regexp?
("'", lex_file_sheet_in_quote, "'!") |
("'", lex_sheet_in_quote, "'!") |
(lex_sheet_out_quote, '!') |
(lex_file, "!") |
(lex_file_wo_brackets, "!") |
("'", lex_file, "'!") |
("'", lex_file_wo_brackets, "'!")]
Without the last 4 cases of lex_before (i.e., (lex_file, "!") | (lex_file_wo_brackets, "!") | ("'", lex_file, "'!") | ("'", lex_file_wo_brackets, "'!")), the total time of compilation (by ocamlc) of the project was 3 minutes 30 seconds (what took time was the compilation of lexer.ml). After adding those 4 cases, the total time of compilation is 13 minutes 40 seconds. What takes time is always the compilation of lexer.ml.
Does anyone know how we could identify what slows down the compilation?
Is there anything wrong in the way I write named regular expressions that slows down the compilation?