I happened to be looking at this question, and I chanced upon this string:
#2335, IFCRELASSOCIATESMATERIAL, '2ON6$yXXD1GAAH8whbdZmc', #5,$,$, [#40,#221,#268,#281],#2334
And I got interested in trying to replace only the commas (,) within the substring [#40,#221,#268,#281] with underscores (_). I was attempting this in R with the stringr package, and my idea was to use str_replace() as follows:
- First locate the substring in the parent string with lookarounds:
(?<=\\[).+(?=\\[). (I am using\\to escape since that's whatstringruses.) - Then match all instances of only the commas within the substring with
[^0-9#]+. So now the regex would be(?<=\\[)[^0-9#]+(?=\\[). - Now use
str_replace()to replace the above matches with_as follows:str_replace(mystring, "(?<=\\[)[^0-9#]+(?=\\[)", "_") - where
mystringcontains the string#2335, IFCRELASSOCIATESMATERIAL, '2ON6$yXXD1GAAH8whbdZmc', #5,$,$, [#40,#221,#268,#281],#2334
I thought the regex I constructed should parse as: replace one or more characters that are not digits or # within the bounds of [ and ] with the character _. But evidently, this isn't the case as my attempt did not work.
Where am I going wrong and what is/are the right way(s) to solve regex problems of this kind?
tl;dr: how does one extract all tokens but a certain token (or set of tokens) from a substring bounded by two other arbitrary tokens?