I'm trying to produce some javascript code that will traverse an HTML document and pick out words from a JSON array, if matched the javascript would wrap the text in a <a href='glossary/#[matched text]'>[matched text]</a> and render to screen.
I seem to have that part semi-down, the bit where I'm falling over is how best to tell the system to ignore certain elements (i.e text already in a, buttons, input, element attributes...etc). I've tried to resolve this with the regex and managed to fumble along and get the following:
/(?<!<(a|button|submit|pre|img|svg|path|h[0-9]|.*data-ignore.*>|input\/>|textarea|pre|code))((?<!(="|data-))\btext\b(?!"))(?!<\/(a|button|submit|pre|img|svg|path|h[0-9])>)/gi
(text is the word I'm trying to auto-link) - https://regex101.com/r/u7cLPR/1
If you follow the Regex101 link you'll see I "think" I've managed to cover all bases bar one which is when the word occurs in a class='' tag (and therefore others like style and such)
Any help here would be greatly appreciated here, as always with Regex I always seem to miss the mark or over-complicate the solution, (is Regex even the right tool for the job here?)