I have the following code for finding out a pattern (consecutively repeated substring) in a string, say 0110110110000. The output patterns are 011 and 110, since they are both repeated within the string. What changes can be done to the following code?
I'd like to identify substrings that start from any position in a given string, and which repeat for at least a threshold number of times. In the above mentioned string, the threshold is three (th = 3). The repeated string should be the maximal repeated string. In the above string, 110 and 011 both satisfy these conditions.
Here's my attempt at doing this:
reps <- function(s, n) paste(rep(s, n), collapse = "") # repeat s n times
find.string <- function(string, th = 3, len = floor(nchar(string)/th)) {
for(k in len:1) {
pat <- paste0("(.{", k, "})", reps("\\1", th-1))
r <- regexpr(pat, string, perl = TRUE)
if (attr(r, "capture.length") > 0) break
}
if (r > 0) substring(string, r, r + attr(r, "capture.length")-1) else ""
}