I am writing a regular expression in python to capture the contents inside an SSI tag.
I want to parse the tag:
<!--#include file="/var/www/localhost/index.html" set="one" -->
into the following components:
- Tag Function (ex: include,echoorset)
- Name of attribute, found before the =sign
- Value of attribute, found in between the "'s
The problem is that I am at a loss on how to grab these repeating groups, as name/value pairs may occur one or more times in a tag. I have spent hours on this.
Here is my current regex string:
^\<\!\-\-\#([a-z]+?)\s([a-z]*\=\".*\")+? \-\-\>$
It captures the include in the first group and file="/var/www/localhost/index.html" set="one" in the second group, but what I am after is this:
group 1: "include"
group 2: "file"
group 3: "/var/www/localhost/index.html"
group 4 (optional): "set"
group 5 (optional): "one"
(continue for every other name="value" pair)
 
     
     
     
     
     
     
    