I am trying to calculate e=∑(3−4k^2/(2k+1)!); k=0..10000
However I got stuck and can't get a desired performance boost using multithreading.
Given a number of threads, I tried to divide the whole sum to k / numberOfThreads chunks and submit futures for each partial sum.
I think the bad part might be the factorial calculation  or the granularity. I tried with a smaller step, but didn't get a big improvement. Maybe a different approach is needed.
ExecutorService executor = Executors.newFixedThreadPool(numberOfThreads);
List<Future<BigDecimal>> futures = new ArrayList<>(numberOfThreads);
int step = k / numberOfThreads ;
BigDecimal result = BigDecimal.ZERO;
for (int j = 0; j <= k; j += step) {
    Future<BigDecimal> future = executor.submit(new EulerCalculator(j, j + step));
    futures.add(future);
}
for (Future<BigDecimal> future : futures) {
    result = result.add(future.get());
}
public class EulerCalculator implements Callable<BigDecimal> {
    private int start;
    private int end;
    public BigDecimal call() {
        long numerator = 3 - 4 * start * start;
        BigDecimal denominator = factorial(2 * start + 1);
        BigDecimal partialSum = BigDecimal.valueOf(numerator)
                                .divide(denominator, 1000, RoundingMode.HALF_EVEN);
        for (int i = start + 1 ; i < end; i++) {
            numerator = 3 - 4 * i * i;
            denominator = denominator.multiply(BigDecimal.valueOf(2 * i * (2*i + 1)));
            partialSum = partialSum.add(BigDecimal.valueOf(numerator)
                                        .divide(fact, 1000, RoundingMode.HALF_EVEN));
        }
        return partialSum;
    }
    private BigDecimal factorial(int cur) {
        BigDecimal fact = BigDecimal.ONE;
        for (int i = 2; i <= cur; i++) {
            fact = fact.multiply(BigDecimal.valueOf(i));
        }
        return fact;
    }
}
Best results from a few runs on a quad-core:
k = 10000
threads = 1: 345ms
threads = 2: 216ms
threads = 4: 184ms
threads = 8: 225ms
