Can any one tell me why I can not successfully test OpenBLAS's dgemm performance (in GFLOPs) in R via the following way?
- link R with the "reference BLAS"
libblas.so - compile my C program
mmperf.cwith OpenBLAS librarylibopenblas.so - load the resulting shared library
mmperf.sointo R, call the R wrapper functionmmperfand reportdgemmperformance in GFLOPs.
Point 1 looks strange, but I have no choice because I have no root access on machines I want to test, so actual linking to OpenBLAS is impossible. By "not successfully" I mean my program ends up reporting dgemm performance for reference BLAS instead of OpenBLAS. I hope someone can explain to me:
- why my way does not work;
- is it possible at all to make it work (this is important, because if it is impossible, I must write a C
mainfunction and do my job in a C program.)
I've investigated into this issue for two days, here I will include various system output to assist you to make a diagnose. To make things reproducible, I will also include the code, makefile as well as shell command.
Part 1: system environment before testing
There are 2 ways to invoke R, either using R or Rscript. There are some differences in what is loaded when they are invoked:
~/Desktop/dgemm$ readelf -d $(R RHOME)/bin/exec/R | grep "NEEDED"
0x00000001 (NEEDED) Shared library: [libR.so]
0x00000001 (NEEDED) Shared library: [libpthread.so.0]
0x00000001 (NEEDED) Shared library: [libc.so.6]
~/Desktop/dgemm$ readelf -d $(R RHOME)/bin/Rscript | grep "NEEDED"
0x00000001 (NEEDED) Shared library: [libc.so.6]
Here we need to choose Rscript, because R loads libR.so, which will automatically load the reference BLAS libblas.so.3:
~/Desktop/dgemm$ readelf -d $(R RHOME)/lib/libR.so | grep blas
0x00000001 (NEEDED) Shared library: [libblas.so.3]
~/Desktop/dgemm$ ls -l /etc/alternatives/libblas.so.3
... 31 May /etc/alternatives/libblas.so.3 -> /usr/lib/libblas/libblas.so.3.0
~/Desktop/dgemm$ readelf -d /usr/lib/libblas/libblas.so.3 | grep SONAME
0x0000000e (SONAME) Library soname: [libblas.so.3]
Comparatively, Rscript gives a cleaner environment.
Part 2: OpenBLAS
After downloading source file from OpenBLAS and a simple make command, a shared library of the form libopenblas-<arch>-<release>.so-<version> can be generated. Note that we will not have root access to install it; instead, we copy this library into our working directory ~/Desktop/dgemm and rename it simply to libopenblas.so. At the same time we have to make another copy with name libopenblas.so.0, as this is the SONAME which run time loader will seek for:
~/Desktop/dgemm$ readelf -d libopenblas.so | grep "RPATH\|SONAME"
0x0000000e (SONAME) Library soname: [libopenblas.so.0]
Note that the RPATH attribute is not given, which means this library is intended to be put in /usr/lib and we should call ldconfig to add it to ld.so.cache. But again we don't have root access to do this. In fact, if this can be done, then all the difficulties are gone. We could then use update-alternatives --config libblas.so.3 to effectively link R to OpenBLAS.
Part 3: C code, Makefile, and R code
Here is a C script mmperf.c computing GFLOPs of multiplying 2 square matrices of size N:
#include <R.h>
#include <Rmath.h>
#include <Rinternals.h>
#include <R_ext/BLAS.h>
#include <sys/time.h>
/* standard C subroutine */
double mmperf (int n) {
/* local vars */
int n2 = n * n, tmp; double *A, *C, one = 1.0;
struct timeval t1, t2; double elapsedTime, GFLOPs;
/* simulate N-by-N matrix A */
A = (double *)calloc(n2, sizeof(double));
GetRNGstate();
tmp = 0; while (tmp < n2) {A[tmp] = runif(0.0, 1.0); tmp++;}
PutRNGstate();
/* generate N-by-N zero matrix C */
C = (double *)calloc(n2, sizeof(double));
/* time 'dgemm.f' for C <- A * A + C */
gettimeofday(&t1, NULL);
F77_CALL(dgemm) ("N", "N", &n, &n, &n, &one, A, &n, A, &n, &one, C, &n);
gettimeofday(&t2, NULL);
/* free memory */
free(A); free(C);
/* compute and return elapsedTime in microseconds (usec or 1e-6 sec) */
elapsedTime = (double)(t2.tv_sec - t1.tv_sec) * 1e+6;
elapsedTime += (double)(t2.tv_usec - t1.tv_usec);
/* convert microseconds to nanoseconds (1e-9 sec) */
elapsedTime *= 1e+3;
/* compute and return GFLOPs */
GFLOPs = 2.0 * (double)n2 * (double)n / elapsedTime;
return GFLOPs;
}
/* R wrapper */
SEXP R_mmperf (SEXP n) {
double GFLOPs = mmperf(asInteger(n));
return ScalarReal(GFLOPs);
}
Here is a simple R script mmperf.R to report GFLOPs for case N = 2000
mmperf <- function (n) {
dyn.load("mmperf.so")
GFLOPs <- .Call("R_mmperf", n)
dyn.unload("mmperf.so")
return(GFLOPs)
}
GFLOPs <- round(mmperf(2000), 2)
cat(paste("GFLOPs =",GFLOPs, "\n"))
Finally there is a simple makefile to generate the shared library mmperf.so:
mmperf.so: mmperf.o
gcc -shared -L$(shell pwd) -Wl,-rpath=$(shell pwd) -o mmperf.so mmperf.o -lopenblas
mmperf.o: mmperf.c
gcc -fpic -O2 -I$(shell Rscript --default-packages=base --vanilla -e 'cat(R.home("include"))') -c mmperf.c
Put all these files under working directory ~/Desktop/dgemm, and compile it:
~/Desktop/dgemm$ make
~/Desktop/dgemm$ readelf -d mmperf.so | grep "NEEDED\|RPATH\|SONAME"
0x00000001 (NEEDED) Shared library: [libopenblas.so.0]
0x00000001 (NEEDED) Shared library: [libc.so.6]
0x0000000f (RPATH) Library rpath: [/home/zheyuan/Desktop/dgemm]
The output reassures us that OpenBLAS is correctly linked, and the run time load path is correctly set.
Part 4: testing OpenBLAS in R
Let's do
~/Desktop/dgemm$ Rscript --default-packages=base --vanilla mmperf.R
Note our script needs only the base package in R, and --vanilla is used to ignore all user settings on R start-up. On my laptop, my program returns:
GFLOPs = 1.11
Oops! This is truely reference BLAS performance not OpenBLAS (which is about 8-9 GFLOPs).
Part 5: Why?
To be honest, I don't know why this happens. Each step seems to work correctly. Does something subtle occurs when R is invoked? For example, any possibility that OpenBLAS library is overridden by reference BLAS at some point for some reason? Any explanations and solutions? Thanks!