I have some useful regular expressions in Perl. Is there a simple way to translate them to .NET's dialect of regular expressions?
If not, is there a concise reference of differences?
I have some useful regular expressions in Perl. Is there a simple way to translate them to .NET's dialect of regular expressions?
If not, is there a concise reference of differences?
There is a big comparison table in http://www.regular-expressions.info/refflavors.html.
Most of the basic elements are the same, the differences are:
Minor differences:
\u200A, in Perl it is \x{200A}.\v in .NET is just the vertical tab (U+000B), in Perl it stands for the "vertical whitespace" class. Of course there is \V in Perl because of this.(?(name)yes|no), but (?(<name>)yes|no) in Perl. Some elements are Perl-only:
x?+, x*+, x++ etc). Use non-backtracking subexpression ((?>…)) instead.\N{LATIN SMALL LETTER X}, \N{U+200A}.\l (lower case next char), \u (upper case next char).\L (lower case), \U (upper case), \Q (quote meta characters) until \E.\pL and \PL. You have to include the braces in .NET e.g. \p{L}.\X, \C.\v, \V, \h, \H, \N, \R\g1, \g{-1}. You can only use absolute group index in .NET.\g{name}. Use \k<name> instead.[[:alpha:]].(?|…)\K. Use look-behind ((?<=…)) instead.(?{…}), post-poned subexpression (??{…}).(?0), (?R), (?1), (?-1), (?+1), (?&name). (?{…})(R), (R1), (R&name)(DEFINE). (*VERB:ARG)(?P<name>…). Use (?<name>…) instead.(?P=name). Use \k<name> instead.(?P>name). No equivalent in .NET.Some elements are .NET only:
\K instead.(?(pattern)yes|no).[a-z-[d-w]](?<-name>…). This could be simulated with code evaluation assertion (?{…}) followed by a (?&name).References:
They were designed to be compatible with Perl 5 regexes. As such, Perl 5 regexes should just work in .NET.
You can translate some RegexOptions as follows:
[Flags]
public enum RegexOptions
{
Compiled = 8,
CultureInvariant = 0x200,
ECMAScript = 0x100,
ExplicitCapture = 4,
IgnoreCase = 1, // i in Perl
IgnorePatternWhitespace = 0x20, // x in Perl
Multiline = 2, // m in Perl
None = 0,
RightToLeft = 0x40,
Singleline = 0x10 // s in Perl
}
Another tip is to use verbatim strings so that you don't need to escape all those escape characters in C#:
string badOnTheEyesRx = "\\d{4}/\\d{2}/\\d{2}";
string easierOnTheEyesRx = @"\d{4}/\d{2}/\d{2}";
It really depends on the complexity of the regular expression - many ones will work the same out of the box.
Take a look at this .NET regex cheat sheet to see if an operator does what you expect it to do.
I don't know of any tool that automatically translates between RegEx dialects.