I am currently strugling with a sort of large (30M rows, 14+ columns) dataset in r using data.table package on my laptop with 8GB ram running 64-bit Win 10.
I have been hitting memory limits all day long, getting the error that R can't allocate slightly over 200MB for a vector. When I look into the Windows Task Manager, I can see, that 2-3GB of RAM are currently at use by R (or about 65% of total, including the system and some other processes). When I run the R gc() command, I get the output, that about 7800Mb out of 8012Mb is currently at use.
When I run the gc() command for second time, I can see that there was no change in the used memory thanks to previous execution of the gc.
When processing the data (i.e executing some data.table command), the process uses pretty much all installed memory and writes a thing or two to disk.
What is the reason for the difference between gc() output and what I see in task manager? Or to be more precise, why is the number in task manager lower?