I have simple sorting program, which I am profiling in order to have a case to study gprof; I later plan to profile a much larger algorithm.
I have compiled with -pg and ran ./sort to produce the gmon.out file.
However, when I run gprof ./sort gmon.out the values produced in cumulative seconds and self seconds are, as I believe, not accurate.
Firstly, running time(./sort) I get:
real 0m14.352s
user 0m14.330s
sys 0m0.005s
Which is accurate with my stopwatch.
However, the gprof results for the flat profile are:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
56.18 2.76 2.76 1 2.76 4.71 sort(std::vector<int, std::allocator<int> >&)
35.01 4.49 1.72 1870365596 0.00 0.00 std::vector<int, std::allocator<int> >::operator[](unsigned long)
8.96 4.93 0.44 100071 0.00 0.00 std::vector<int, std::allocator<int> >::size() const
0.00 4.93 0.00 50001 0.00 0.00 __gnu_cxx::new_allocator<int>::construct(int*, int const&)
0.00 4.93 0.00 50001 0.00 0.00 void __gnu_cxx::__alloc_traits<std::allocator<int> >::construct<int>(std::allocator<int>&, int*, int const&)
0.00 4.93 0.00 50001 0.00 0.00 std::vector<int, std::allocator<int> >::push_back(int const&)
0.00 4.93 0.00 50001 0.00 0.00 operator new(unsigned long, void*)
0.00 4.93 0.00 170 0.00 0.00 std::_Iter_base<int*, false>::_S_base(int*)
0.00 4.93 0.00 102 0.00 0.00 std::_Niter_base<int*>::iterator_type std::__niter_base<int*>(int*)
0.00 4.93 0.00 68 0.00 0.00 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::base() const
0.00 4.93 0.00 68 0.00 0.00 std::_Miter_base<int*>::iterator_type std::__miter_base<int*>(int*)
0.00 4.93 0.00 52 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator()
0.00 4.93 0.00 51 0.00 0.00 __gnu_cxx::new_allocator<int>::max_size() const
0.00 4.93 0.00 34 0.00 0.00 __gnu_cxx::__alloc_traits<std::allocator<int> >::max_size(std::allocator<int> const&)
0.00 4.93 0.00 34 0.00 0.00 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::__normal_iterator(int* const&)
0.00 4.93 0.00 34 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() const
0.00 4.93 0.00 34 0.00 0.00 std::vector<int, std::allocator<int> >::max_size() const
0.00 4.93 0.00 34 0.00 0.00 int* std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<int>(int const*, int const*, int*)
0.00 4.93 0.00 34 0.00 0.00 int* std::__uninitialized_copy<true>::__uninit_copy<int*, int*>(int*, int*, int*)
0.00 4.93 0.00 34 0.00 0.00 int* std::__copy_move_a<false, int*, int*>(int*, int*, int*)
0.00 4.93 0.00 34 0.00 0.00 int* std::__copy_move_a2<false, int*, int*>(int*, int*, int*)
0.00 4.93 0.00 34 0.00 0.00 int* std::uninitialized_copy<int*, int*>(int*, int*, int*)
0.00 4.93 0.00 34 0.00 0.00 int* std::__uninitialized_copy_a<int*, int*, int>(int*, int*, int*, std::allocator<int>&)
0.00 4.93 0.00 34 0.00 0.00 int* std::__uninitialized_move_if_noexcept_a<int*, int*, std::allocator<int> >(int*, int*, int*, std::allocator<int>&)
0.00 4.93 0.00 34 0.00 0.00 int* std::copy<int*, int*>(int*, int*, int*)
0.00 4.93 0.00 18 0.00 0.00 void std::_Destroy_aux<true>::__destroy<int*>(int*, int*)
0.00 4.93 0.00 18 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::_M_deallocate(int*, unsigned long)
0.00 4.93 0.00 18 0.00 0.00 void std::_Destroy<int*>(int*, int*)
0.00 4.93 0.00 18 0.00 0.00 void std::_Destroy<int*, int>(int*, int*, std::allocator<int>&)
0.00 4.93 0.00 17 0.00 0.00 __gnu_cxx::new_allocator<int>::deallocate(int*, unsigned long)
0.00 4.93 0.00 17 0.00 0.00 __gnu_cxx::new_allocator<int>::allocate(unsigned long, void const*)
0.00 4.93 0.00 17 0.00 0.00 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::difference_type __gnu_cxx::operator-<int*, std::vector<int, std::allocator<int> > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > const&, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > const&)
0.00 4.93 0.00 17 0.00 0.00 std::vector<int, std::allocator<int> >::_M_check_len(unsigned long, char const*) const
0.00 4.93 0.00 17 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long)
0.00 4.93 0.00 17 0.00 0.00 std::vector<int, std::allocator<int> >::_M_insert_aux(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, int const&)
0.00 4.93 0.00 17 0.00 0.00 std::vector<int, std::allocator<int> >::end()
0.00 4.93 0.00 17 0.00 0.00 std::vector<int, std::allocator<int> >::begin()
0.00 4.93 0.00 17 0.00 0.00 unsigned long const& std::max<unsigned long>(unsigned long const&, unsigned long const&)
0.00 4.93 0.00 2 0.00 0.00 std::operator|(std::_Ios_Openmode, std::_Ios_Openmode)
0.00 4.93 0.00 1 0.00 0.00 _GLOBAL__sub_I_main
0.00 4.93 0.00 1 0.00 0.00 generateData(std::basic_ofstream<char, std::char_traits<char> >&)
0.00 4.93 0.00 1 0.00 0.22 writeSortedFile(std::vector<int, std::allocator<int> >&)
0.00 4.93 0.00 1 0.00 0.00 __static_initialization_and_destruction_0(int, int)
0.00 4.93 0.00 1 0.00 0.00 loadBuf(std::vector<int, std::allocator<int> >&, std::basic_ifstream<char, std::char_traits<char> >&)
0.00 4.93 0.00 1 0.00 0.00 __gnu_cxx::new_allocator<int>::new_allocator()
0.00 4.93 0.00 1 0.00 0.00 __gnu_cxx::new_allocator<int>::~new_allocator()
0.00 4.93 0.00 1 0.00 0.00 std::allocator<int>::allocator()
0.00 4.93 0.00 1 0.00 0.00 std::allocator<int>::~allocator()
0.00 4.93 0.00 1 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::_Vector_impl::_Vector_impl()
0.00 4.93 0.00 1 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::_Vector_impl::~_Vector_impl()
0.00 4.93 0.00 1 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::_Vector_base()
0.00 4.93 0.00 1 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::~_Vector_base()
0.00 4.93 0.00 1 0.00 0.00 std::vector<int, std::allocator<int> >::vector()
0.00 4.93 0.00 1 0.00 0.00 std::vector<int, std::allocator<int> >::~vector()
So, the cumulative seconds do not accumulate to the true value (~14s), it would appear. The results do show that sort() is the most time costly, but the actual time values do not add up.
-z does not change this, but thats expected.
The call graph (not included), does not seem to show anything that suggests where the missing seconds are; i.e the extra seconds are not in children.
I seem to get similar results (where gprof gives much smaller time values than expected) when I try and profile my larger algorithm which I mention above - gprof says the runtime is ~450s, where as it actually takes over 3hrs. I though this was due to gprof not being able to handle MPI, which the larger algorithm uses extensively, but now I think I am either misinterpreting the gprof results, or I am missing some flags.
Is it possible I am not actually taking into account my gmon.out file?
The reason I think this is that, when I run gprof ./sort I get the exact same results as gprof ./sort gmon.out. Therefore, it seems like its not even using gmon.out. I thought gmon.out was needed in conjunction with the executable to map time to functions. How can gprof produce an output without gmon.out?
Any enlightening information is more than welcome, thanks in advance!
NOTE:
reading around e.g (this post) , I found info suggesting that gprof has trouble with analysis heap allocation etc (new). I should note that ./sort uses std::vector to hold elements, which will be allocated to heap. Please let me know if this is a possible issue.