As you said, you already have a working solution for a different split char in Split string on single forward slashes with RegExp. That code does not actually split the string, but it matches everything except "/"s. It then returns the result of each individual match in a collection (yes, it ends up splitting).
What you need to do here is match each character in str, unless the next characters are either // or and. We can use a lookahead for this.
Just change the pattern in your code with the following:
.Pattern = "(?!$)((?:(?!//|\band\b).)*)(?://|and|$)"
Alternatively, if you want to trim spaces for each token, use the following regex:
.Pattern = "(?!$)((?:(?!\s*//|\s*\band\b).)*)\s*(?://|and|$)\s*"
Although this will also match the // or and, it's using a ( group ) to capture the actual token. Therefore, you have to add the tokens to the collection using .SubMatches(0) (what was backreferenced by the first group).
In your code, instead of adding coll.Add r_item.Value, use:
coll.Add r_item.SubMatches(0)
Note: if your string has line breaks, don't forget to set the rExp object with .Multiline = True.
VBA Code:
Sub GetMatches(ByRef str As String, ByRef coll As Collection)
Dim rExp As Object, rMatch As Object
Set rExp = CreateObject("vbscript.regexp")
With rExp
.Global = True
.MultiLine = True
.Pattern = "(?!$)((?:(?!\s*//|\s*\band\b).)*)\s*(?://|and|$)\s*"
End With
Set rMatch = rExp.Execute(str)
If rMatch.Count > 0 Then
For Each r_item In rMatch
coll.Add r_item.subMatches(0)
Next r_item
End If
End Sub
And this is how you can call it with your example:
Dim text As String
text = "t/xt1.//text2,and landslide/ andy // text3- and text4"
'vars to get result of RegExp
Dim matches As New Collection, token
Set matches = New Collection
'Exec the RegExp --> Populate matches
GetMatches text, matches
'Print each token in debug window
For Each token In matches
Debug.Print "'" & token & "'"
Next token
Debug.Print "======="
Each token is printed in the Immediate Window.
- This code is a modified version of the code originally posted by @stribizhev
Output in Immediate Window:
't/xt1.'
'text2,'
'landslide/ andy'
'text3-'
'text4'
=======
More in-depth explanation
You may wonder how this pattern works. I'll try to explain with a detailed description. And to do that, let's take only the significant parts of the pattern, using the following regex (the rest isn't really important):
((?:(?!//|\band\b).)*)(?://|and|$)
It can easily be divided in two constructs:
- First, the subpattern
((?:(?!//|\band\b).)*) is a group that matches each token, backreferencing the text we want to return for each match. In vba, groups are returned with .SubMatches(). Let's brake it down:
- The inner expression
(?!//|\band\b). first checks to guarantee it's not followed by a split string ("//" or "and"). If it's not, the regex engine matches one character (notice the dot at the end). And that's it, it matches one character allowed as part of the token we're capturing.
- Now, it's enclosed in
(?:(?!//|\band\b).)* to repeat it for every char it can match, we get all the characters in the token. This construct is the closest it can get to a while loop.
While it's not followed by a split string, get next char.
- If you think about it, it's the construct
.* we all know, with an extra condition for each character.
- The second subpattern
(?://|and|$) is easier, simply match a split string ("//", "and" or the end of line). It's inside a non-capturing group, meaning it will be matched, but it won't store a copy of its value.
For example:
text1 a/s and text2 a/b//last
^ ^| | [1]: 1st subpattern, captured in Matches(0).SubMatches(0)
|--------|^-^
| 1 2| [2]: Split string, not captured but included in match
|-----------|
3 [3]: The whole match, returned by Matches(0)
For the second match, Matches(1).Value = " text2 a/b//"
Matches(1).Submatches(0) = " text2 a/b"
The rest of the pattern are simply details:
(?!$) is to avoid matching an empty string at the end of the line.
- All the
\s* are there to trim the token (to avoid capturing whitespaces at the beggining or end of a token).