I am experimenting with regular expressions in Java, in particular with groups. I am trying to strip empty tags from a string with xml. Without using groups, everything seems to be fine, but if I try to define a regex using groups, magic begins that I don't understand. I expect behavior like last assertion in code below:
    @Test
    public void testRegexpGroups() {
        String xml =
            "<root>\n" +
                "    <yyy></yyy>\n" +
                "    <yyy>456</yyy>\n" +
                "    <aaa>  \n\n" +
                "    </aaa>\n" +
                "</root>";
        Pattern patternA = Pattern.compile("(\\s*)<(\\s*\\w+\\s*)>(\\1)</(\\2)>");
        Pattern patternB = Pattern.compile("(\\s*)<(\\s*\\w+\\s*)>\\s*</(\\2)>");
        Pattern patternC = Pattern.compile("\\s*<\\s*\\w+\\s*>\\s*</\\s*\\w+\\s*>");
        assertEquals(
            "<root>\n" +
            "    \n" +
            "    <yyy>456</yyy>\n" +
            "    <aaa>  \n" +
            "\n" +
            "    </aaa>\n" +
            "</root>",
            patternA.matcher(xml).replaceAll("")
        );
        assertEquals(
            "<root>\n" +
                "    <yyy>456</yyy>\n" +
                "</root>",
            patternB.matcher(xml).replaceAll("")
        );
        assertEquals(
            "<root>\n" +
                "    <yyy>456</yyy>\n" +
                "</root>",
            patternC.matcher(xml).replaceAll("")
        );
    }
I can get it if I use this regex: "\\s*<\\s*\\w+\\s*>\\s*</\\s*\\w+\\s*>", but I don't understand why I can't do the same with "(\\s*)<(\\s*\\w+\\s*)>(\\1)</(\\2)>"
Please explain to me the difference in the behavior of the regular expressions specified here.
 
    