Using the double-pointer is by far the best compromise between execution speed/optimisation and legibility.  Using a single array to store matrix' contents is actually what a double-pointer does.
I have successfully used the following templated creator function (yes, I know I use old C-style pointer referencing, but it does make code more clear on the calling side with regards to changing parameters - something I like about pointers which is not possible with references. You will see what I mean):
///
/// Matrix Allocator Utility
/// @param pppArray Pointer to the double-pointer where the matrix should be allocated.
/// @param iRows Number of rows.
/// @param iColumns Number of columns.
/// @return Successful allocation returns true, else false.
template <typename T>
bool NewMatrix(T*** pppArray, 
               size_t iRows, 
               size_t iColumns)
{
   bool l_bResult = false;
   if (pppArray != 0) // Test if pointer holds a valid address.
   {                  // I prefer using the shorter 0 in stead of NULL.
      if (!((*pppArray) != 0)) // Test if the first element is currently unassigned.
      {                        // The "double-not" evaluates a little quicker in general.
         // Allocate and assign pointer array.
         (*pppArray) = new T* [iRows]; 
         if ((*pppArray) != 0) // Test if pointer-array allocation was successful.
         {
            // Allocate and assign common data storage array.
            (*pppArray)[0] = new T [iRows * iColumns]; 
            if ((*pppArray)[0] != 0) // Test if data array allocation was successful.
            {
               // Using pointer arithmetic requires the least overhead. There is no 
               // expensive repeated multiplication involved and very little additional 
               // memory is used for temporary variables.
               T** l_ppRow = (*pppArray);
               T* l_pRowFirstElement = l_ppRow[0];
               for (size_t l_iRow = 1; l_iRow < iRows; l_iRow++)
               {
                  l_ppRow++;
                  l_pRowFirstElement += iColumns;
                  l_ppRow[0] = l_pRowFirstElement;
               }
               l_bResult = true;
            }
         }
      }
   }
}
To de-allocate the memory created using the abovementioned utility, one simply has to  de-allocate in reverse.
///
/// Matrix De-Allocator Utility
/// @param pppArray Pointer to the double-pointer where the matrix should be de-allocated.
/// @return Successful de-allocation returns true, else false.
template <typename T>
bool DeleteMatrix(T*** pppArray)
{
   bool l_bResult = false;
   if (pppArray != 0) // Test if pointer holds a valid address.
   {
      if ((*pppArray) != 0) // Test if pointer array was assigned.
      {
         if ((*pppArray)[0] != 0) // Test if data array was assigned.
         {
               // De-allocate common storage array.
               delete [] (*pppArray)[0];
            }
         }
         // De-allocate pointer array.
         delete [] (*pppArray);
         (*pppArray) = 0;
         l_bResult = true;
      }
   }
}
To use these abovementioned template functions is then very easy (e.g.):
   .
   .
   .
   double l_ppMatrix = 0;
   NewMatrix(&l_ppMatrix, 3, 3); // Create a 3 x 3 Matrix and store it in l_ppMatrix.
   .
   .
   .
   DeleteMatrix(&l_ppMatrix);