According to the article, which defines BIDE:
BIDE: Efficient Mining of Frequent Closed Sequences
Theorem 2 (BackScan search space pruning):
Let the prefix sequence be an n-sequence,
Sp=e1e2...en. If∃i(1≤i≤n)and there exists an iteme′which appears in each of the i-th semi-maximum periods of the prefixSpinSDB, we can safely stop growing prefixSp.Proof:
Because item
e′appears in each of the i-th semi-maximum periods of the prefixSpinSDB, we can get a new prefixS′p=e1e2...ei−1e′ei...en(1<i≤n)orS′p=e′e1e2...en(i=1), and both(Sp ⊂ S′p)and(supSDB(Sp)=supSDB(S′p))hold. Any locally frequent iteme′′w.r.t. prefixSpis also a locally frequent item w.r.t.S′p, in the meantime(〈Sp,e′′〉⊂〈S′p,e′′〉)and(supSDB(〈Sp,e′′〉)=supSDB(〈S′p,e′′〉))hold. This means there is no hope to mine frequent closed sequences with prefixSp.
I understand that for example if I have an AB pattern, and I find an e', for example C, which is in the 2nd maximum period of the pattern, so between A and B for every sequence, which contains AB, then I can stop growing AB, because I could use backward extension to make ACB, which will have the same support, but it is longer. So any pattern I would get by extending AB forward, certainly won't be a closed pattern, because the C is missing from it. That's why I have to stop growing AB and wait until PrefixSpan grows A -> AC -> ACB with forward extension. What I don't understand why the maximum period must be constrained to the semi-maximum period in this context and why testing for the semi-maximum period is okay. The article does not write a real explanation. Any idea? Can you write an example where we lose closed frequent patterns by using the maximum period instead of the semi-maximum period?