In a numerical physics project of mine, I'd like to compare memory usage of different methods for solving the same problem.
I've found out that I can include <sys/resource.h> and use getrusage() to get the maximum amount of used memory in ru_maxrss (with some caveats that I don't think I need to care about).
For benchmarking, I essentially run code blocks like these for all the different methods I've implemented:
int minN = 6;
int maxN = 16;
std::chrono::steady_clock::time_point start;
std::chrono::steady_clock::time_point finish;
std::cout << "Naive:" << std::endl;
for (int N = minN; N <= maxN; N+=2) {
struct rusage usage{};
start = std::chrono::steady_clock::now();
//do work...
finish = std::chrono::steady_clock::now();
int ret = getrusage(RUSAGE_SELF, &usage);
long time_ns = std::chrono::duration_cast<std::chrono::nanoseconds>(finish - start).count();
long max_ram_byte = usage.ru_maxrss;
std::cout << "N = " << N << ", time = " << time_ns/1e9 << " s, ram = " << max_ram_byte << " KB" << std::endl;
}
Now, the problem is that ru_maxrss contains the maximum amount of used memory for the whole lifetime of the program, i.e. it is not reduced if a "large" object goes out of scope. Thus, the output of the whole program will look something like this:
Naive:
N = 6, time = 0.022541 s, ram = 8028 KB
N = 8, time = 0.0234674 s, ram = 65360 KB
N = 10, time = 0.373676 s, ram = 135284 KB
N = 12, time = 21.7536 s, ram = 631792 KB
Magnetization:
N = 6, time = 0.000166585 s, ram = 631792 KB
N = 8, time = 0.00158378 s, ram = 631792 KB
N = 10, time = 0.022255 s, ram = 631792 KB
N = 12, time = 0.405172 s, ram = 631792 KB
Momentum:
N = 6, time = 0.000175482 s, ram = 631792 KB
N = 8, time = 0.000766058 s, ram = 631792 KB
N = 10, time = 0.00658272 s, ram = 631792 KB
N = 12, time = 0.0728279 s, ram = 631792 KB
Parity:
N = 8, time = 0.000986243 s, ram = 631792 KB
N = 12, time = 0.0528302 s, ram = 631792 KB
Spin Inversion:
N = 8, time = 0.00111167 s, ram = 631792 KB
N = 12, time = 0.050363 s, ram = 631792 KB
Once memory usage has peaked, the reported memory usage of my benchmark is useless. I realize that, in principle, this is how getrusage() is supposed to work. Is there a way to reset this metric? Or can anyone recommend another easy way to measure memory usage from inside the program that does not involve using specific benchmarking libraries?
Regards
PS: Does anyone know whether or in which cases ru_maxrss is in B or KB? For N = 8, I store a matrix with 65536 double elements. This matrix should dominate memory usage and I'd expect it to take up about 65536 Bytes of memory. My benchmark reports that I use 65360 KB, as the documentation of getrusage() says the result is in KB. This is eerily close to the estimated number of Bytes I was expecting. So is the result really in KB and this is purely a coincidence?
Update:
I got what I wanted working parsing /proc/self/stat, I'll share my updated code below in case anyone finds this in the future. Note that rss, the 24th entry of stat is in pages, so one must multiply it by 4096 to get an approximation of the used amount of RAM in B.
std::cout << "Naive:" << std::endl;
for (int N = minN; N <= maxN; N+=2) {
start = std::chrono::steady_clock::now();
// do work...
finish = std::chrono::steady_clock::now();
std::ifstream statFile("/proc/self/stat");
std::string statLine;
std::getline(statFile, statLine);
std::istringstream iss(statLine);
std::string entry;
long long memUsage;
for (int i = 1; i <= 24; i++) {
std::getline(iss, entry, ' ');
if (i == 24) {
memUsage = stoi(entry);
}
}
long time_ns = std::chrono::duration_cast<std::chrono::nanoseconds>(finish - start).count();
std::cout << "N = " << N << ", time = " << time_ns/1e9 << " s, ram = " << 4096*memUsage/1e9 << " GB" << std::endl;
}