I wrote a small benchmark, in which the program creates 108 2-Dimensional std::vector structures of {float, float}, and then sums up the square of their lengths.
Here is the C++ code:
#include <iostream>
#include <chrono>
#include <vector>
#include <array>
#include <cmath>
    
using namespace std;
using namespace std::chrono;
    
const int COUNT = pow(10, 8);
    
class Vec {
public:
    float x, y;
    
    Vec() {}
    
    Vec(float x, float y) : x(x), y(y) {}
    
    float len() {
        return x * x + y * y;
    }
};
    
int main() {
    vector <Vec> vecs;
    
    for(int i = 0; i < COUNT; ++i) {
        vecs.emplace_back(i / 3, i / 5);
    }
    
    auto start = high_resolution_clock::now();
    
    // This loop is timed
    float sum = 0;
        for(int i = 0; i < COUNT; ++i) {
        sum += vecs[i].len();
    }
    
    auto stop = high_resolution_clock::now();
    
    cout << "finished in " << duration_cast <milliseconds> (stop - start).count()
         << " milliseconds" << endl;
    cout << "result: " << sum << endl;
    
    return 0;
}
For which I used this makefile (g++ version 7.5.0):
build:
 g++ -std=c++17 -O3 main.cpp -o program #-ffast-math 
    
run: build
 clear
 ./program
Here is my Java code:
public class MainClass {
    static final int COUNT = (int) Math.pow(10, 8);
    static class Vec {
        float x, y;
        Vec(float x, float y) {
            this.x = x;
            this.y = y;
        }
        float len() {
            return x * x + y * y;
        }
    }
    public static void main(String[] args) throws InterruptedException {
        Vec[] vecs = new Vec[COUNT];
        for (int i = 0; i < COUNT; ++i) {
            vecs[i] = new Vec(i / 3, i / 5);
        }
        long start = System.nanoTime();
        // This loop is timed
        float sum = 0;
        for (int i = 0; i < COUNT; ++i) {
            sum += vecs[i].len();
        }
        long duration = System.nanoTime() - start;
        System.out.println("finished in " + duration / 1000000 + " milliseconds");
        System.out.println("result: " + sum);
    }
}
Which was compiled and ran using Java 11.0.4
And here are the results (the average of a few runs, ran on ubuntu 18.04 16bit):
c++:  262 ms
java: 230 ms
Here are a few things I have tried in order to make the c++ code faster:
- Use std::arrayinstead ofstd::vector
- Use a plain array instead of std::vector
- Use an iterator in the forloop
However, none of the above resulted in any improvement.
I have noticed a few interesting things:
- When I time the whole main()function (allocation + computation), C++ is much better. However, this might be due to the warm up time of the JVM.
- For a lower number of objects, like 107, C++ was slightly faster (by a few milliseconds).
- Turning on -ffast-mathmakes the C++ program a few times faster than Java, however the result of the computation is slightly different. Furthermore, I read in a few posts that using this flag is unsafe.
Can I somehow modify my C++ code and make it as fast or faster than Java in this case?
 
     
    