I am a novice programmer and have only briefly covered the anatomy of a function call (setting up the stack, etc.). I can write a function two different ways and I'm wondering which (if either) is more efficient. This is for a finite element program so this function could be called several thousand times. It is using the linear algebra library Aramdillo.
First way:
void Q4::stiffness(mat &stiff) 
{
    stiff.zeros; // sets all elements of the matrix to zero
    // a bunch of linear algebra calculations
    // ...
    stiff *= h;
}
int main()
{
    mat elementStiffness(Q4__DOF, Q4__DOF);
    mat globalStiffness(totalDOF, totalDOF);
    for (int i = 0; i < reallyHugeNumber; i++)
    {
        elements[i].stiffness(&elementStiffness, PSTRESS);
        assemble(&globalStiffness, &elementStiffness);
    }
    return 0;
}
Second way:
mat Q4::stiffness() 
{
    mat stiff(Q4__DOF, Q4__DOF); // initializes element stiffness matrix
    // a bunch of linear algebra calculations
    // ...
    return stiff *= h;
}
int main()
{
    mat elementStiffness(Q4__DOF, Q4__DOF);
    mat globalStiffness(totalDOF, totalDOF);
    for (int i = 0; i < reallyHugeNumber; i++)
    {
        elementStiffness = elements[i].stiffness(PSTRESS);
        assemble(&globalStiffness, &elementStiffness);
    }
    return 0;
}
I think what I'm asking is: using the second way is mat stiff pushed to the stack and then copied into elementStiffness? Because I imagine the matrix being pushed to the stack and then being copied is much more expensive than passing a matrix be reference and setting its elements to zero.
 
     
     
    