I am working in R with some sequential data. Specifically I have a list of integers that appear several times in various sequences. What I am trying to do is to create some code that can identify how many different sequences appear.
Currently, I am doing it manually. I predefine patterns that exist and apply a function that counts the occurrences.
I first use RMYSQL to make the query which is stored in variable product_process_history_joined. Then, I create a list of my data of interest which is stored in the variable data. Then, I define which patterns should my function work on and last I apply my function that counts the number of occurrences.
The code:
product_process_history_joined<-dbGetQuery(con,"SELECT * 
                                       FROM product, process_history
                                       WHERE product.idproduct =    process_history.product_idproduct")
data<-product_process_history_joined$process_types_idprocess_types
pat <- c(1,2,4,5,6)
x <- sapply(1:(length(data)-length(pat)), function(x) all(data[x:     (x+length(pat)-1)] == pat))
route<-data[which(x)]
countR<-length(route)
pat1 <- c(1,2,4,5,7,9,7,7,2,5,6,10)
x <- sapply(1:(length(data)-length(pat1)), function(x) all(data[x:     (x+length(pat1)-1)] == pat1))
route1<-data[which(x)]
countR1<-length(route1)
The dataset that is produced and stored in the data variable looks like this:
[1]  1  4  5  6  1  4  5  6  1  4  5  6  1  4  5  6  1  4  5  6  1  4  5      6  1  4  5  6  1  4  5  6  1  4  5
[36]  6  1  4  5  6  1  4  5  6  1  4  5  6  1  4  5  6  1  4  5  6  1  4   5  6  1  4  5  6  1  4  5  6  1  4
[71]  5  6  1  4  5  6  1  4  5  6  1  4  5  6  1  2  4  5  6 10  1  2  4  5  7  9  7  7  2  5  6 10  1  2  4
[106]  5  6 10  1  2  4  5  6 10  1  2  4  8  1  2  3  5  7  8  1  2  3  5  6  1  2  3  5  6  1  2  4  5  6 10
This is a just a subset of the list. I use around 12 different patterns. The results for the first 2 patterns in the given dataset is 21 for pat and 1 for pat1.
 
     
     
    