I have arrays of time series, averaging about 1000 values per array. I need to independently identify time series segments in each array.
I'm currently using the approach to calculate the mean of the array and segment items whenever the elapsed time between each item exceeds it. I couldn't find much information on standards on how to accomplish this. I'm sure there are more appropriate methods.
This is the code that I'm currently using.
def time_cluster(input)
    input.sort!
    differences = (input.size-1).times.to_a.map {|i| input[i+1] - input[i] }
    mean = differences.mean
    clusters = []
    j = 0
    input.each_index do |i|
      j += 1 if i > 0 and differences[i-1] > mean
      (clusters[j] ||= []) << input[i]
    end
    return clusters
  end
A couple of samples from this code
time_cluster([1, 2, 3, 4, 7, 9, 250, 254, 258, 270, 292, 340, 345, 349, 371, 375, 382, 405, 407, 409, 520, 527])
Outputs
1  2  3  4  7  9, sparsity 1.3
250  254  258  270  292,  sparsity 8.4
340  345  349  371  375  382  405  407  409, sparsity 7
520  527, sparsity 3
Another array
time_cluster([1, 2, 3, 4 , 5, 6, 7, 8, 9, 10, 1000, 1020, 1040, 1060, 1080, 1200])
Outputs
1  2  3  4  5  6  7  8  9  10, sparsity 0.9
1000  1020  1040  1060  1080, sparsity 16
1200
 
     
     
    