I'm trying to implement the code shown in this pdf. More precisely (page 50):
#define SM (CLS / sizeof (double))
for (i = 0; i < N; i += SM)
  for (j = 0; j < N; j += SM)
    for (k = 0; k < N; k += SM)
      for (i2 = 0, rres = &res[i][j],
           rmul1 = &mul1[i][k]; i2 < SM;
           ++i2, rres += N, rmul1 += N)
         for (k2 = 0, rmul2 = &mul2[k][j];
              k2 < SM; ++k2, rmul2 += N)
           for (j2 = 0; j2 < SM; ++j2)
              rres[j2] += rmul1[k2] * rmul2[j2];
Now, as far as I'm concerned, rres is int*, rmul2 and rmul1 too.
Should it look like
int *rres;
int *rmul1;
int *rmul2;
for (i = 0; i < N; i += SM)
   for (j = 0; j < N; j += SM)
      for (k = 0; k < N; k += SM)
         for (i2 = 0, *rres = &res[i][j],
              *rmul1 = &mul1[i][k]; i2 < SM;
              ++i2, rres += N, rmul1 += N)
            for (k2 = 0, *rmul2 = &mul2[k][j];
                 k2 < SM; ++k2, rmul2 += N)
               for (j2 = 0; j2 < SM; ++j2)
                  rres[j2] += rmul1[k2] * rmul2[j2];
Because this seems more or less reasonable to me bit gives wrong results. For example, if I have two matrices 2 x 2, randoms values 0 or 1, I get:
-1520010527 23996350 
212687419 207125308
which is far from good. My guess is that I use * wrong but I can't tell where...
EDIT:
Declaration of res:
  int **res = (int **)malloc(N * sizeof(int *));
  for (int i = 0; i < N; i++) {
      res[i] = (int *)malloc(N * sizeof(int));
  }
  for (int i = 0; i < N; i++) {
      for (int j = 0; j < N; j++) {
          res[i][j] = 0;
      }
  }
 
    