I have a table containing a column with strings. I want to extract all pieces of text in each string that come immediately after a certain substring. For this minimum reproducible example, let's assume this substring is abc. So I want all subsequent terms after abc.
I'm able to achieve this in cases where there is only 1 abc per row, but my logic fails when there are multiple abcs. I'm also getting the number of substring occurrences, but am having trouble relating that to retrieving all of those occurrences.
My approach/attempt:
I created a temp table that contains the # of successful regex matches in my main string:
CREATE TEMP TABLE match_count AS (
SELECT DISTINCT id, main_txt, regexp_count(main_txt, 'abc (\\S+)', 1) AS cnt
FROM my_data_source
WHERE regexp_count(main_txt, 'abc (\\S+)', 1) > 0);
My output:
id   main_txt                         cnt
1    wpfwe abc weiofnew abc wieone    2
2    abc weoin                        1
3    abc weoifn abc we abc w          3
To get my final output, I have a query like:
SELECT id, main_txt, regexp_substr(main_txt, 'abc (\\S+)', 1, cnt, 'e') AS output
FROM match_count;
My actual final output:
id   main_txt                         output
1    wpfwe abc weiofnew abc wieone    wieone
2    abc weoin                        weoin
3    abc weoifn abc we abc w          w
My expected final output:
id   main_txt                         output
1    wpfwe abc weiofnew abc wieone    weiofnew
1    wpfwe abc weiofnew abc wieone    wieone
2    abc weoin                        weoin
3    abc weoifn abc we abc w          weoifn
3    abc weoifn abc we abc w          we
3    abc weoifn abc we abc w          w
So my code only gets the final match (where the occurrence # = cnt). How can I modify it to include every match?
 
     
    