I have a text extracted from a large PDF file. I am only interested in one part of this text. I only need the part which is present between 2 test substrings AND which has 1 or more occurrences of a specific word XX12QW. Out of those 2 test substrings/words, the first one can be included in the match as shown in the desired output below
Input String:
test
abc def
test 123
test pqr
XX12QW
jkl XX12QW hjas
12asd23 test bxs
Desired Output:
test pqr
XX12QW
jkl XX12QW hjas
12asd23
Things to be noted:
- There are multiple occurrences of the substring
test. - I need only the part between 2 substrings/words -
testwhich contain 1 or more occurrences of the wordXX12QW. This wordXX12QWwill not be present at all between any other pairs of the word -test. That is, there will never be a case like this:test abc XX12QW test isadkj XX12QW test an test - One extra test case would be if the word
XX12QWis present betweentestand$(End of string/file):- Input:
test absjh123 sjnc test jhsd32 test aabb XX12QW asdj XX12QW sdfk - Desired Output:
test aabb XX12QW asdj XX12QW sdfk
- Input:
I am stuck on this for a long time now and really need someone else to look at it.
Regex: test[\s\S]*?XX12QW[\s\S]*?(?=test)
Would really appreciate any help.