1

Memory-mapping, e.g. via Python's numpy.memmap, works, albeit temporarily; once pagefile capacity is exceeded, the arrays are silently unmapped from pagefile. Re-mapping each time is undesired - need persistence. Further, I don't know how to view the pagefile - i.e. see what's on it.

Intended use: using SSD pagefile as 'pseudo-RAM', w/ 10% of RAM's read speed to accelerate deep learning by loading an entire dataset into memory (but reading only 500MB at a time).

How can this be accomplished? Help is appreciated.


SPECS:
  • System: Win-10 OS, ASUS ROG Strix GL702VSK
  • SSD: 512GB, 3.5GB/s read speed -- NVMe PCIe 970 PRO
  • Pagefile: 80GB, on C-drive (SSD drive, system drive)
  • RAM: 24GB DDR4 2.4-MHz
  • CPU: i7-7700HQ 2.8 GHz

1 Answers1

0

Storing data in the page/swap file (pagefile.sys on Windows) means storing it in virtual memory. If that's really what you want, then you're already doing it whenever you allocate an array in the usual way.

Virtual RAM, like physical RAM, doesn't survive a reboot. There's no way to store data permanently in the page file. It could technically be done because it is a file on a persistent medium, but it just isn't meant for that. Its purpose is to simulate physical RAM.

It sounds like what you really want is to store your numpy array not in the page file, but in an ordinary disk file – the opposite of your title.

I've never done this, but according to the documentation you linked,

An alternative to using this subclass is to create the mmap object yourself, then create an ndarray with ndarray.__new__ directly, passing the object created in its ‘buffer=’ parameter.

which means that you ought be to able to create the array data like this:

file = open('backing_file', 'xb')
mapped_data = mmap.mmap(file.fileno(), 123456 * 4, access=mmap.ACCESS_WRITE)
array = np.ndarray.__new__(shape=(123456,), buffer=mapped_data, dtype='float32')
# fill in the array

and then, on a subsequent run, map the array into memory like this:

file = open('backing_file', 'rb')
mapped_data = mmap.mmap(file.fileno(), 123456 * 4, access=mmap.ACCESS_READ)
array = np.ndarray.__new__(shape=(123456,), buffer=mapped_data, dtype='float32')
# use the array

The startup time of subsequent runs will be very fast; the array data will be paged in from disk when it's read.

Instead of mmap.ACCESS_READ, you could pass mmap.ACCESS_WRITE (in which case any changes to the in-memory array will propagate to disk), or mmap.ACCESS_COPY (in which case changes to the in-memory array will be allowed, but they will not be written to disk and will be lost when the process exits).

Here's the documentation for the mmap module.

benrg
  • 866