In the following example, I assume that functions f1-f4 are slow, but short and inlined. It is clear to me on iteration i=j that the taken branch of iteration i=j+1 is dependent on the value of data[j+1] so I can predict it in advance during the computation of iteration i=j.
How can I help the x86 branch predictor to see this? Or maybe it already sees it without any changes from my side? If yes, how does it work?
int foo(int* data, int n) {
    int x = 0;
    for (int i = 0; i < n; i++) {
        switch (data[i]) {
            case 0: x = f1(x); break;
            case 1: x = f2(x); break;
            case 2: x = f3(x); break;
            default: x = f4(x); break;
        }
    }
    return x;
}
 
    