I'm trying to parse raw wikipedia article content, e.g. the article on Sweden, using re.sub(). However, I am running into problems trying to substitute blocks of {{some text}}, because they can contain further blocks of {{some text}}.
Here's an abbreviated example from the above article:
{{Infobox country
| conventional_long_name = Kingdom of Sweden
| native_name = {{native name|sv|Konungariket Sverige|icon=no}}
| common_name = Sweden
}}
Some text I do not want parsed.
{{Link GA|eo}}
The curly braces within curly braces recursion could theoretically be arbitrarily nested to any number of levels.
If I match the greedy block of {{.+}}, everything is matched from {{Infobox to eo}}, including the text I do not want matched.
If I match the ungreedy block of {{.+}}, the part from {{Infobox to icon=no}} is matched, as is {{Link GA|eo}}. But then I'm left with the string | common_name [...] not want parsed.
I also tried variants of \{\{.+(\{\{.+\}\})*.+\}\} and \{\{[^\{]+(\{\{[^\{]+\}\})*[^\{]+\}\}, in the hopes of matching only sub-blocks within the larger block, but to no avail.
I'd list all of what I've tried, but I honestly can't remember half and I doubt it'd be of much use anyway. It always comes back to the same problem: that for the double curly end braces }} to match, there needs to have been the same number of {{ occurrences beforehand.
Is this even solvable using regular expressions, or do I need another solution?