I would like to use the group capture in a regular expression to parse a string. Can this be done with MySQL?
For example,
SELECT my_wishful_function('->hello world<-', '/->(.*)<-/')
and have it return:
hello world
I would like to use the group capture in a regular expression to parse a string. Can this be done with MySQL?
For example,
SELECT my_wishful_function('->hello world<-', '/->(.*)<-/')
and have it return:
hello world
MySQL regex engine does not support backreferences, unlike other RDBMS such as Oracle or Postgres.
You can use regexp_replace() instead. That is, remove the unwanted parts instead of capturing the wanted part:
select regexp_replace('->hello world<-', '(^->)|(<-$)', '')
Regexp explanation: ^-> is the unwanted part at the beginning of the string, and <-$ is the unwanted part at the end of the string. We surround the parts with parentheses to define groups, and separate them with |, which means or.
select regexp_replace('->hello world<-', '(^->)|(<-$)', '') new_col
| new_col | | :---------- | | hello world |
 
    
     
    
    An elegant solution is to use REGEXP_SUBSTR() and positive lookahead/behind assertions :
SELECT REGEXP_SUBSTR('->hello world<-', '(?<=->).*(?=<-)');
-- hello world
(?<=...) is a positive lookbehind assertion : we check matching pattern is preceded by ...
(?=...) is a positive lookahead assertion : we check matching pattern is followed by ...
These assertions are not capturing, so we don't need to delete delimiters -> and <-
It's compatible with MySQL 8
In MySQL 5.7, if the boundaries -> and <- only occur once in the string, you can use SUBSTRING_INDEX twice to extract the portion between the boundaries:
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX('I said ->hello world<- today', '->', -1), '<-', 1)
Output
hello world
