There are many questions which ask to find the longest repeating substring:
- Longest substring that occurs at least twice: C++ question
- Find longest repeating substring in JavaScript using regular expressions
- Find longest repeating strings?
- Regex to match the longest repeating substring
But these don't exactly match my requirements, which are:
- The substrings may not overlap (they are disjoint).
- The substrings are allowed to be non-adjacent.
- Any characters are allowed.
- I would like to match the longest pattern like this.
So far I have this:
>>> m = re.match(".*(?P<grp>.+).*(?P=grp).*", "dhiblhip")
>>> m.group('grp')
'i'
I think this is matching the last substring which repeats itself, 'i', but that's certainly not the longest one. I'd expect the following output for the following input:
'123abc'->'''hh'->'h''hihi'->'hi''dhiblhip'->'hi''phiblhip'->'hi'(note how I do not return'p'since it is not as long as'hi'even though it is a repeating disjoint substring.)'racecaracecar'->'raceca'(note how I can't recycle the middler.) In this case,'acecar'is just as acceptable.
I am using Python's re and would like to continue to do so, but answers in another language are not unwelcome.