I am noticing huge performance differences in what appears to be same aggregate, at least conceptually. The tests were made on a simple collection structure, that has an _id and a name and a createdAt, but there 20 million of those. There is an index on createdAt. It's hosted on an mlab cluster, version is 3.6.9 WiredTiger.
I am trying to get a simple paging going using aggregate, I know I could use find and limit, but I like to add more elements to the pipeline, the example I give is very distilled.
db.getCollection("runnablecalls").aggregate([
         {
          $facet: {
            docs: [
                { $sort:  {createdAt: -1} },
                { $limit:  25 },
                { $skip:  0 },                                
            ],
            page_info: [
              { $group: { _id: null, total: { $sum: 1 } } 
              }
            ],
          }
        }                           
      ])
That takes almost 40s. Now if I moved the $sort and $limit outside of the facet it takes 0.042s. 
db.getCollection("runnablecalls").aggregate([
        { $sort:  {createdAt: -1} },
        { $limit:  25 }, 
        {
          $facet: {
            docs: [           
                { $skip:  0 },                
            ],            
            page_info: [
                { 
                  $group: { _id: null, total: { $sum: 1 } } 
                } 
            ]}
        },                           
      ])
The page_info facet makes no difference at the end, I can take it out without difference, I am just leaving it in because I like use it. I know how to solve the problem using two queries a count and an aggregate without a $facet. I just like to understand why this happens.