Try this way
String data = "aaaabbbaaaaab";
Matcher m = Pattern.compile("(?=(a+b+|b+a+))(^|(?<=a)b|(?<=b)a)").matcher(data);
while(m.find())
System.out.println(m.group(1));
This regex uses look around mechanisms and will find (a+b+|b+a+) that
- exists at start
^ of the input
- starts with
b that is predicted by a
- starts with
a that is predicted by b.
Output:
aaaabbb
bbbaaaaa
aaaaab
Is ^ essentially needed in this regular expression?
Yes, without ^ this regex wouldn't capture aaaabbb placed at start of input.
If I wouldn't add (^|(?<=a)b|(?<=b)a) after (?=(a+b+|b+a+)) this regex would match
aaaabbb
aaabbb
aabbb
abbb
bbbaaaaa
bbaaaaa
baaaaa
aaaaab
aaaab
aaab
aab
ab
so I needed to limit this results to only these that starts with a that has b before it (but not include b in match - so look behind was perfect for that) and b that is predicted by a.
But lets not forget about a or b that are placed at start of the string and are not predicted by anything. To include them we can use ^.
Maybe it will be easier to show this idea with this regex
(?=(a+b+|b+a+))((?<=^|a)b|(?<=^|b)a).
(?<=^|a)b will match b that is placed at start of string, or has a before it
(?<=^|b)a will match a that is placed at start of string, or has b before it