Preferred approach at the end of the answer
It seems you are looking for look-around mechanism.
For instance if you want to split on whitespace which has no foo before and no bar after it your code can look like
split("(?<!foo)\\s(?!bar)")
Update (assuming that there can't be any nested [...] and they are well formatted for instance all [ are closed with ]):
Your case seems little more complex. What you can do is accept , if
- it doesn't have any
[ or ] after it,
or if first opening bracket [ after this comma, has no closing bracket ] between this comma and itself, otherwise it would mean that comma is inside of area like
[ , ] [
^ ^ ^ - first `[` after tested comma
| +---- one `]` between tested comma and first `[` after it
+------ tested comma
So your code can look like
(this is original version, but below is little simplified one)
split(",(?=[^\\]]*(\\[|$))")
This regex is based on idea that commas you don't want to accept are inside [foo,bar]. But how to determine that we are inside (or outside) such block?
- if character is inside then there will be no
[ character after it, until we find ] (next [ can appear after found ] like in case [a,b],[c,d] comma between a and b has no [ until it finds ], but there can be some new area [..] after it which ofcourse starts with [)
- if character are outside
[...] area then next after it can appear only non ] characters, until we find start of [...] area, or we will read end of string.
Second case is the one you are interested in. So we need to create regex which will accept , which has only non ] after it (it is not inside [...]) until it finds [ or read end of string (represented by $)
Such regex can be written as
, comma
(?=...) which has after it
[^\\]]*(\\[|$)
[^\\]]* zero or more non ] characters (] need to be escaped as metacharacter)
(\\[|$) which have [ (it also needs to be escaped in regex) or end of string after it
Little simplified split version
string.split(",(?![^\\[]*\\])");
Which means: split on comma , which after it has no (represented by (?!...)) unclosed ] (unclosed ] has no [ between tested comma and itself which can be written as [^\\[]*\\])
Preferred approach
To avoid such complex regex don't use split but Pattern and Matcher classes, which will search for areas like [...] or non-comma words.
String string = "a,b,[c,d],e";
Pattern p = Pattern.compile("\\[.*?\\]|[^,]+");
Matcher m = p.matcher(string);
while (m.find())
System.out.println(m.group());
Output:
a
b
[c,d]
e