I have implemented one matrix multiplication with boost::numeric::ublas::matrix (see my full, working boost code)
Result result = read ();
boost::numeric::ublas::matrix<int> C;
C = boost::numeric::ublas::prod(result.A, result.B);
and another one with the standard algorithm (see full standard code):
vector< vector<int> > ijkalgorithm(vector< vector<int> > A, 
                                    vector< vector<int> > B) {
    int n = A.size();
    // initialise C with 0s
    vector<int> tmp(n, 0);
    vector< vector<int> > C(n, tmp);
    for (int i = 0; i < n; i++) {
        for (int k = 0; k < n; k++) {
            for (int j = 0; j < n; j++) {
                C[i][j] += A[i][k] * B[k][j];
            }
        }
    }
    return C;
}
This is how I test the speed:
time boostImplementation.out > boostResult.txt
diff boostResult.txt correctResult.txt
time simpleImplementation.out > simpleResult.txt
diff simpleResult.txt correctResult.txt
Both programs read a hard-coded textfile which contains two 2000 x 2000 matrices. Both programs were compiled with these flags:
g++ -std=c++98 -Wall -O3 -g $(PROBLEM).cpp -o $(PROBLEM).out -pedantic
I got 15 seconds for my implementation and over 4 minutes for the boost-implementation!
edit: After compiling it with
g++ -std=c++98 -Wall -pedantic -O3 -D NDEBUG -DBOOST_UBLAS_NDEBUG library-boost.cpp -o library-boost.out
I got 28.19 seconds for the ikj-algorithm and 60.99 seconds for Boost. So Boost is still considerably slower.
Why is boost so much slower than my implementation?