As far as my knowledge on resource management goes, allocating something on the heap (operator new) should always be slower than allocating on the stack (automatic storage), because the stack is a LIFO-based structure, thus it requires minimal bookkeeping, and the pointer of the next address to allocate is trivial.
So far, so good. Now look at the following code:
/* ...includes... */
using std::cout;
using std::cin;
using std::endl;
int bar() { return 42; }
int main()
{
    auto s1 = std::chrono::steady_clock::now();
    std::packaged_task<int()> pt1(bar);
    auto e1 = std::chrono::steady_clock::now();
    auto s2 = std::chrono::steady_clock::now();
    auto sh_ptr1 = std::make_shared<std::packaged_task<int()> >(bar);
    auto e2 = std::chrono::steady_clock::now();
    auto first = std::chrono::duration_cast<std::chrono::nanoseconds>(e1-s1);
    auto second = std::chrono::duration_cast<std::chrono::nanoseconds>(e2-s2);
    cout << "Regular: " << first.count() << endl
         << "Make shared: " << second.count() << endl;
    pt1();
    (*sh_ptr1)();
    cout << "As you can see, both are working correctly: " 
         << pt1.get_future().get() << " & " 
         << sh_ptr1->get_future().get() << endl;
    return 0;
}
The results seem to contradict the stuff explained above:
Regular: 6131
Make shared: 843
As you can see, both are working correctly: 42 & 42
Program ended with exit code: 0
In the second measurement, apart from the call of operator new, the constructor of the std::shared_ptr (auto sh_ptr1) has to finish. I can't seem to understand why is this faster then regular allocation.
What is the explanation for this?
 
     
     
     
     
    