PHP-FPM keeps a shared Opcache memory between the parent process and all its child processes in a pool. The idea is to compile source code once and then reuse it in all child processes directly as byte code. This is efficient but as a System administrator I recently stumbled across a problem – how to find out the real memory usage by the Opcache in the operating system?
I thought a simple “ps” list would reveal the memory usage because it would be accounted to the parent process, because the parent process created the anonymously shared mmap() region. Linux doesn’t work this way. In order to debug this easily, I created a simple program which does the following:
- The parent process creates a shared anonymous memory region using mmap() with a size of 2000 MB. The parent process does not use the memory region in any way. It doesn’t change any data in it.
- Two child processes are fork()’ed and then:
- The first process writes 500 MB of data in the beginning of shared memory region passed by the parent process.
- The second process writes 1000 MB of data in the beginning of the shared memory region passed by the parent process.
Here is how the process list looks like:
A quick explanation of this process list snapshot:
- VSZ (virtual size) of the parent and child processes is 2000 MB because the parent process has allocated 2000 MB of anonymous memory using mmap(). No additional allocations were made by the child processes as they were passed a reference to the anonymously shared memory in the parent process. Therefore the virtual memory footprint of all processes is the same.
- RSS (resident set size, or simply “the real usage”) is:
- Almost none for the parent process because it never used any memory. It only “requested” the memory block by mmap().
- 500 MB for the first child processes because it wrote 500 MB of data at the beginning of the shared memory region.
- 1000 MB for the second child processes because it wrote 1000 MB of data at the beginning of the shared memory region.
- The “free -m” command shows that 1012 MB of anonymously shared memory is being used.
So far things seem kind of logical. We can roughly determine the real usage of the shared memory region by looking at the child processes. This however is also not really true because if they write at completely different regions in the anonymous memory, we would need to sum their usage. If they write to the very same memory region, we need to look at the max() value.
The pmap command doesn’t provide any additional information and shows the same values that we see in the “ps” output:
Things get even more messy when the child processes terminate (and get replaced by new ones which never touched the shared anonymous memory). Here is how the process list looks like:
The RSS (resident set size, or simply “the real usage”) of the parent process continues to show no usage. But the anonymous memory region is actually used because the child processes wrote data in it. And the region is not automatically free()’d, because the parent process is still alive. The “free -m” command clearly shows that there are 1000 MB of data stored in anonymous shared memory.
How can we reliably find out the memory usage of the anonymous shared region and account it to a given process?
We will look at /proc/[pid]/maps:
A file containing the currently mapped memory regions and their access permissions. See mmap(2) for some further information about memory mappings.
…
If the pathname field is blank, this is an anonymous mapping as obtained via mmap(2). There is no easy way to coordinate this back to a process’s source, short of running it through gdb(1), strace(1), or similar.
Wikipedia gives the following additional information:
When “/dev/zero” is memory-mapped, e.g., with mmap(), to the virtual address space, it is equivalent to using anonymous memory; i.e. memory not connected to any file.
Now we know how to find out the virtual address of the anonymously memory-mapped region. Here I demostrate two different ways of obtaining the address:
The man page of tmpfs gives further insight:
An internal shared memory filesystem is used for […] shared anonymous mappings (mmap(2) with the MAP_SHARED and MAP_ANONYMOUS flags).
…
The amount of memory consumed by all tmpfs filesystems is shown in the Shmem field of /proc/meminfo and in the shared field displayed by free(1).
We verify that the memory-mapped region is a “tmpfs” file:
💚 We can then finally get the real memory usage of this shared anonymous memory block in terms of VSS (virtual memory size) and RSS (resident set size, or simply “the real usage”):
Since we have access to the memory region as a file, we can even read this memory mapped region: