I have a 2D array of data stored in column-major (Fortran-style) format, and I'd like to take the FFT of each row. I would like to avoid transposing the array (it is not square). For example, my array
fftw_complex* data = new fftw_complex[21*256];
contains entries [r0_val0, r1_val0,..., r20_val0, r0_val1,...,r20_val255].
I can use fftw_plan_many_dft to make a plan to solve each of the 21 FFTs in-place in the data array if it is row-major, e.g. [r0_val0, r0_val1,..., r0_val255, r1_val0,...,r20_val255]:
int main() {
  int N = 256;
  int howmany = 21;
  fftw_complex* data = new fftw_complex[N*howmany];
  fftw_plan p;
  // this plan is OK
  p = fftw_plan_many_dft(1,&N,howmany,data,NULL,1,N,data,NULL,1,N,FFTW_FORWARD,FFTW_MEASURE);
  // do stuff...
  return 0;
}
According to the documentation (section 4.4.1 of the FFTW manual), the signature for the function is
fftw_plan fftw_plan_many_dft(int rank, const int *n, int howmany,
                              fftw_complex *in, const int *inembed,
                              int istride, int idist,
                              fftw_complex *out, const int *onembed,
                              int ostride, int odist,
                              int sign, unsigned flags);
and I should be able to use the stride and dist parameters to set the indexing. From what I can understand from the documentation, the entries in the array to be transformed are indexed as in + j*istride + k*idist where j=0..n-1 and k=0..howmany-1. (My arrays are 1D and there are howmany of them). However, the following code results in a seg. fault (edit: the stride length is wrong, see update below): 
int main() {
  int N = 256;
  int howmany = 21;
  fftw_complex* data = new fftw_complex[N*howmany];
  fftw_plan p;
  // this call results in a seg. fault.
  p = fftw_plan_many_dft(1,&N,howmany,data,NULL,N,1,data,NULL,N,1,FFTW_FORWARD,FFTW_MEASURE);
  return 0;
}
Update:
I made an error choosing the stride length. The correct call is (the correct stride length is howmany, not N):
int main() {
  int N = 256;
  int howmany = 21;
  fftw_complex* data = new fftw_complex[N*howmany];
  fftw_plan p;
  // OK
  p = fftw_plan_many_dft(1,&N,howmany,data,NULL,howmany,1,data,NULL,howmany,1,FFTW_FORWARD,FFTW_MEASURE);
  // do stuff  
  return 0;
}