Virtual memory—user-space cache

2021-02-01 Permalink

Having an in-memory LRU cache of temporary results is a common thing to have in all kinds of applications. However, the size of such a cache is usually hard-coded, configured, or computed based on the available physical memory.

All of these options are sub-satisfactory compromises: there’s no cooperation between different processes having those kinds of in-memory caches. If the limits are too low, they won’t be using the entirety of the hardware to their advantage. Set the limit too high, and they’ll be fighting over RAM, unnecessarily swapping pages in and out.

Instead, the OS should allow a user-space process to mark anonymous memory pages as ‘unneeded’, then acquire them later and check that their content is still intact.

For example: an image viewer decompresses the next and previous images, caching them in memory. If the memory is tight, it would rather throw the cached images away and load them afresh from disk if they happen to be needed again. Same applies to, for example, storing the results deserialized from an on-disk database.

This may be achievable on BSD by using madvise with MADV_FREE, though it’s unclear if there’s a race-free way to acquire the range and check its status. Linux and Windows don’t seem to have anything like this at all.

Memory mapped i/o

Memory mapped files are often presented as a ‘convenience’—a way to treat files as memory without the need to rewrite existing code to work on streamed data. When it comes to non-sequential i/o, however, their main advantage becomes in their ability to utilize the versatile filesystem cache that’s already provided by the OS.

For example, a typical database engine (e.g. SQLite) implements its own user-space cache. Could they do better with memory mapped i/o instead? It comes to the following cited drawbacks of memory mapped i/o:

It’d be interesting to see if these issues can be overcome through introduction of appropriate OS APIs. Or, alternatively, how advantageous a memory mapped approach would be if the above points don’t constitute an issue.