I have found on the internet (here and here), that the inheritance doesn't affect the performance of the class. I have become curious about that as I have been writing a matrices module for a render engine, and the speed of this module is very important for me.
After I have written:
- Base: general matrix class
- Derived from the base: square implementation
- Derived from derived: 3-dim and 4-dim implementations of the square matrix
I decided to test them and faced performance issues with instantiation
And so the main questions are:
- What's the reason of these performance issues in my case and why may they happen in general?
- Should I forget about inheritance in such cases?
This is how these classes look like in general:
template <class t>
class Matrix
{
protected:
    union {
        struct
        {
            unsigned int w, h;
        };
        struct
        {
            unsigned int n, m;
        };
    };
    /** Changes flow of accessing `v` array members */
    bool transposed;
    /** Matrix values array */
    t* v;
public:
    ~Matrix() {
        delete[] v;
    };
    Matrix() : v{}, transposed(false) {};
    // Copy
    Matrix(const Matrix<t>& m) : w(m.w), h(m.h), transposed(m.transposed) {
        v = new t[m.w * m.h];
        for (unsigned i = 0; i < m.g_length(); i++)
           v[i] = m.g_v()[i];
    };
    // Constructor from array
    Matrix(unsigned _w, unsigned _h, t _v[], bool _transposed = false) : w(_w), h(_h), transposed(_transposed) {
       v = new t[_w * _h];
       for (unsigned i = 0; i < _w * _h; i++)
           v[i] = _v[i];
    };
    /** Gets matrix array */
    inline t* g_v() const { return v; }
    /** Gets matrix values array size */
    inline unsigned g_length() const { return w * h; }
    // Other constructors, operators, and methods.
}
template<class t>
class SquareMatrix : public Matrix<t> {
public:
    SquareMatrix() : Matrix<t>() {};
    SquareMatrix(const Matrix<t>& m) : Matrix<t>(m) {};
    SquareMatrix(unsigned _s, t _v[], bool _transpose) : Matrix<t>(_s, _s, _v, _transpose) {};
    // Others...
}
template<class t>
class Matrix4 : public SquareMatrix<t> {
public:
    Matrix4() : SquareMatrix<t>() {};
    Matrix4(const Matrix<t>& m) : SquareMatrix<t>(m) {}
    Matrix4(t _v[16], bool _transpose) : SquareMatrix<t>(4, _v, _transpose) {};
    // Others...
}
To conduct tests I used this
void test(std::ofstream& f, char delim, std::function<void(void)> callback) {
    auto t1 = std::chrono::high_resolution_clock::now();
    callback();
    auto t2 = std::chrono::high_resolution_clock::now();
    f << std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count() << delim;
    //std::cout << "test took " << std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count() << " microseconds\n";
}
Performance problems
With single class initialization, there're no problems - it goes under 5 microseconds almost every time for every class. But then I decided to scale up the number of initializations and their several troubles occurred
I ran every test 100 times, with arrays of length 500
1. Class initialization with the default constructor
I just tested the initialization of arrays
The results were (avg time in microseconds):
- Matrix 25.19
- SquareMatrix 40.37 (37.60% loss)
- Matrix4 58.06 (30.47% loss from SquareMatrix)
And here we can already see a huge difference
Here's the code
int main(int argc, char** argv)
{
    std::ofstream f("test.csv");
    f << "Matrix\t" << "SquareMatrix\t" << "Matrix4\n";
    for (int k = 0; k < 100; k++) {
        test(f, '\t', []() {
            Matrix<long double>* a = new Matrix<long double>[500];
            });
        test(f, '\t', []() {
            SquareMatrix<long double>* a = new SquareMatrix<long double>[500];
            });
        test(f, '\n', []() {
            Matrix4<long double>* a = new Matrix4<long double>[500];
            });
    }
    f.close();
    return 0;
}
2. Class initialization with default constructor and filling
Tested the initialization of arrays of class instances and filling them after with custom matrices
The results (avg time in microseconds):
- Matrix 402.8
- SquareMatrix 475 (15.20% loss)
- Matrix4 593.86 (20.01% loss from SquareMatrix)
Code
int main(int argc, char** argv)
{
    long double arr[16] = {
       1, 2, 3, 4,
       5, 6, 7, 8,
       9, 10, 11, 12,
       13, 14,15,16
    };
    std::ofstream f("test.csv");
    f << "Matrix\t" << "SquareMatrix\t" << "Matrix4\n";
    for (int k = 0; k < 100; k++) {
        test(f, '\t', [&arr]() {
            Matrix<long double>* a = new Matrix<long double>[500];
            for (int i = 0; i < 500; i++) 
                a[i] = Matrix<long double>(4, 4, arr);
            });
        test(f, '\t', [&arr]() {
            SquareMatrix<long double>* a = new SquareMatrix<long double>[500];
            for (int i = 0; i < 500; i++) 
                a[i] = SquareMatrix<long double>(4, arr);
            });
        test(f, '\n', [&arr]() {
            Matrix4<long double>* a = new Matrix4<long double>[500];
            for (int i = 0; i < 500; i++) 
                a[i] = Matrix4<long double>(arr);
            });
    }
    f.close();
    return 0;
}
3. Filling vector with class instances
Pushed back custom matrices to vector
The results (avg time in microseconds):
- Matrix 4498.1
- SquareMatrix 4693.93 (4.17% loss)
- Matrix4 4960.12 (5.37% loss from its SquareMatrix)
Code
int main(int argc, char** argv)
{
    long double arr[16] = {
       1, 2, 3, 4,
       5, 6, 7, 8,
       9, 10, 11, 12,
       13, 14,15,16
    };
    std::ofstream f("test.csv");
    f << "Matrix\t" << "SquareMatrix\t" << "Matrix4\n";
    for (int k = 0; k < 100; k++) {
        test(f, '\t', [&arr]() {
            std::vector<Matrix<long double>> a = std::vector<Matrix<long double>>();
            for (int i = 0; i < 500; i++)
                a.push_back(Matrix<long double>(4, 4, arr));
            });
        test(f, '\t', [&arr]() {
            std::vector<SquareMatrix<long double>> a = std::vector<SquareMatrix<long double>>();
            for (int i = 0; i < 500; i++)
                a.push_back(SquareMatrix<long double>(4, arr));
            });
        test(f, '\n', [&arr]() {
            std::vector<Matrix4<long double>> a = std::vector<Matrix4<long double>>();
            for (int i = 0; i < 500; i++)
                a.push_back(Matrix4<long double>(arr));
            });
    }
    f.close();
    return 0;
}
If you need all the source code, you can look here into matrix.h and matrix.cpp
 
    