gprof seems to fail to collect data from my program. Here is my command line:
g++ -Wall -O3 -g -pg -o fftw_test fftw_test.cpp -lfftw3 -lfftw3_threads -lm && ./fftw_test
Your program uses fftw library and probably consist almost only of fftw library calls. What is the running time? Your program may be too fast to be profiled with gprof. Update And library may be not seen by gprof as it was compiled without gprof profiling  enabled.
GNU gprof has two parts. First, it instruments function calls in c/cpp files which were compiled with -pg option (with mcount function calls - https://en.wikipedia.org/wiki/Gprof) - to get caller/callee info. Second, it links additional profiling library into your executable to add periodic sampling to find which code was executed for more time. Sampling is done with profil (setitimer). Setitimer profiling has limited resolution and can't resolve intervals less than 10 ms or 1 ms (100 or 1000 samples per second).
And in your example, the fftw library was probably compiled without instrumentation, so no mcount calls in it. It still can be captured by sampling part, but only for main thread of the program (https://en.wikipedia.org/wiki/Gprof - "typically it only profiles the main thread of application").
perf profiler have no instrumentation with mcount (it gets callee/caller from stack unwinding when recorded with -g option), but it have much better statistics/sampling variants (it can use hardware PMU counters), without 100 or 1000 Hz limit, and it supports (profiles) threads correctly. Try perf record -F1000 ./fftw_test (with 1 kHz sampling frequency) and perf report or perf report > report.txt. There are some GUI/HTML frontends to perf too: https://github.com/KDAB/hotspot https://github.com/jrfonseca/gprof2dot
For better setitimer style profiler check google-perftools https://github.com/gperftools/gperftools for "CPU PROFILER".
======
With your test I have some gprof results on Debian 8.6 Linux kernel version 3.16.0-4-amd64 x86_64 machine, g++ (Debian 4.9.2-10), gprof is "GNU gprof (GNU Binutils for Debian) 2.27"
$ cat gprof_test.cpp
#include <cstdlib>
#include <ctime>
#include <iostream>
int main()
{
   std::srand( std::time( 0 ) );
   double sum = 0.0;
   for ( int i = 0; i < 100000000; ++i )
      sum += std::rand() / ( double ) RAND_MAX;
   std::cout << sum << '\n';
   return 0;
}
$ g++ -Wall -O3 -g -pg -o gprof_test gprof_test.cpp && time ./gprof_test
5.00069e+06
real    0m0.992s
$ gprof -b gprof_test gmon.out
Flat profile:
Each sample counts as 0.01 seconds.
 no time accumulated
  %   cumulative   self              self     total
 time   seconds   seconds    calls  Ts/call  Ts/call  name
  0.00      0.00     0.00        1     0.00     0.00  _GLOBAL__sub_I_main
So, gprof did not catch any time samples in this 1 second example and have no information about calls into the libraries (they were compiled without -pg). After adding some wrapper functions and prohibiting inline optimization I have some data from gprof but library time was not accounted (it sees 0.72 seconds of 2 second runtime):
$ cat *cpp
#include <cstdlib>
#include <ctime>
#include <iostream>
int rand_wrapper1()
{
  return std::rand();
}
int rand_scale1()
{
  return rand_wrapper1() / ( double ) RAND_MAX;
}
int main()
{
   std::srand( std::time( 0 ) );
   double sum = 0.0;
   for ( int i = 0; i < 100000000; ++i )
    sum+= rand_scale1();
//      sum += std::rand() / ( double ) RAND_MAX;
   std::cout << sum << '\n';
   return 0;
}
$ g++ -Wall -O3 -fno-inline -g -pg -o gprof_test gprof_test.cpp && time ./gprof_test
real    0m2.345s
$ gprof -b gprof_test gmon.out
Flat profile:
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ns/call  ns/call  name
 80.02      0.57     0.57                             rand_scale1()
 19.29      0.71     0.14 100000000     1.37     1.37  rand_wrapper1()
  2.14      0.72     0.02                             frame_dummy
  0.00      0.72     0.00        1     0.00     0.00  _GLOBAL__sub_I__Z13rand_wrapper1v
  0.00      0.72     0.00        1     0.00     0.00  __static_initialization_and_destruction_0(int, int) [clone .constprop.0]
                        Call graph
granularity: each sample hit covers 2 byte(s) for 1.39% of 0.72 seconds
index % time    self  children    called     name
                                                 <spontaneous>
[1]     97.9    0.57    0.14                 rand_scale1() [1]
                0.14    0.00 100000000/100000000     rand_wrapper1() [2]
-----------------------------------------------
                0.14    0.00 100000000/100000000     rand_scale1() [1]
[2]     19.0    0.14    0.00 100000000         rand_wrapper1() [2]
And perf sees all parts:
$ perf record ./gprof_test
0
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.388 MB perf.data (~16954 samples) ]
$ perf report |more   
# Samples: 9K of event 'cycles'
# Event count (approx.): 7373484231
#
# Overhead     Command      Shared Object                     Symbol
# ........  ..........  .................  .........................
#
    25.91%  gprof_test  gprof_test         [.] rand_scale1()
    21.65%  gprof_test  libc-2.19.so       [.] __mcount_internal
    13.88%  gprof_test  libc-2.19.so       [.] _mcount
    12.54%  gprof_test  gprof_test         [.] main
     9.35%  gprof_test  libc-2.19.so       [.] __random_r
     8.40%  gprof_test  libc-2.19.so       [.] __random
     3.97%  gprof_test  gprof_test         [.] rand_wrapper1()
     2.79%  gprof_test  libc-2.19.so       [.] rand
     1.41%  gprof_test  gprof_test         [.] mcount@plt
     0.03%  gprof_test  [kernel.kallsyms]  [k] memset