With a regex: How can I match comments which begin with a semicolon unless the semicolon is surrounded on both sides by unescaped quotes, as shown below (the green blocks denote the matched comments )?:
Note, that the dquotes can by escaped by doubling them up "".
Such escaped dquotes behave as completely different characters, i.e. they do not have the ability to surround the semicolon and disable its comment-starting function.
Also, unbalanced dquotes are treated as escaped dquotes.
With Bubble's help, I have gotten as far as the regex below, which fails to correctly treat a trailing escaped dquote in the last test vector line.
^(?>(?:""[^""\n]*""|[^;""\n]+)*)""?[^"";\n]*(;.*)
See it run here.
Test vectors (the same as in the color-coded diagram above):
Peekaboo ; A comment starts with a semicolon and continues till the EOL
Unless the semicolon is surrounded by dquotes ”Don’t do it ; here” ;but match me; once
Im not surrounded ”so pay attention to me” ; ”peekaboo”
Im not surrounded ”so pay attention” to;me” ; ”peekaboo”
Im not surrounded ”so pay attention to me ; peekaboo
Dquote escapes a dquote so ”dont pay attention to ””me;here”” buster” do it ; here
Don’t pay attention to ”””me;here””” but do ””it;here””
and ”dont do ””it;here””” either ;peekaboo
but "pay attention to "it;here"" ;not here though
Simon said ”I like goats” then he added ”and sheep;” ;a good comment is ”here
Simon said ”I like goats” then he added ”and sheep;” dont do it here
Simon said ””I like goats;”peekaboo
Simon said ”I like goats;””peekaboo
