/contrib/famzah

Enthusiasm never stops


Leave a comment

Debug the usage of anonymously shared memory regions

PHP-FPM keeps a shared Opcache memory between the parent process and all its child processes in a pool. The idea is to compile source code once and then reuse it in all child processes directly as byte code. This is efficient but as a System administrator I recently stumbled across a problem – how to find out the real memory usage by the Opcache in the operating system?

I thought a simple “ps” list would reveal the memory usage because it would be accounted to the parent process, because the parent process created the anonymously shared mmap() region. Linux doesn’t work this way. In order to debug this easily, I created a simple program which does the following:

  • The parent process creates a shared anonymous memory region using mmap() with a size of 2000 MB. The parent process does not use the memory region in any way. It doesn’t change any data in it.
  • Two child processes are fork()’ed and then:
    • The first process writes 500 MB of data in the beginning of shared memory region passed by the parent process.
    • The second process writes 1000 MB of data in the beginning of the shared memory region passed by the parent process.

Here is how the process list looks like:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
famzah 9256 0.0 0.0 2052508 816 pts/15 S+ 18:00 0:00 ./a.out # parent
famzah 9257 10.7 10.1 2052508 512008 pts/15 S+ 18:00 0:01 \_ ./a.out # child 1
famzah 9258 29.0 20.2 2052508 1023952 pts/15 S+ 18:00 0:03 \_ ./a.out # child 2
famzah@vbox64:~$ free -m
total used free shared buff/cache available
Mem: 4940 549 1943 1012 2447 3097
view raw ps output hosted with ❤ by GitHub

A quick explanation of this process list snapshot:

  • VSZ (virtual size) of the parent and child processes is 2000 MB because the parent process has allocated 2000 MB of anonymous memory using mmap(). No additional allocations were made by the child processes as they were passed a reference to the anonymously shared memory in the parent process. Therefore the virtual memory footprint of all processes is the same.
  • RSS (resident set size, or simply “the real usage”) is:
    • Almost none for the parent process because it never used any memory. It only “requested” the memory block by mmap().
    • 500 MB for the first child processes because it wrote 500 MB of data at the beginning of the shared memory region.
    • 1000 MB for the second child processes because it wrote 1000 MB of data at the beginning of the shared memory region.
  • The “free -m” command shows that 1012 MB of anonymously shared memory is being used.

So far things seem kind of logical. We can roughly determine the real usage of the shared memory region by looking at the child processes. This however is also not really true because if they write at completely different regions in the anonymous memory, we would need to sum their usage. If they write to the very same memory region, we need to look at the max() value.

The pmap command doesn’t provide any additional information and shows the same values that we see in the “ps” output:

famzah@vbox64:~$ pmap -XX 9256
9256: ./a.out
Address Perm Offset Device Inode Size KernelPageSize MMUPageSize Rss Pss Shared_Clean Shared_Dirty Private_Clean Private_Dirty Referenced Anonymous LazyFree AnonHugePages ShmemPmdMapped Shared_Hugetlb Private_Hugetlb Swap SwapPss Locked VmFlagsMapping
7f052ea9b000 rw-s 00000000 00:05 733825 2048000 4 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 rd wr sh mr mw me ms sd zero (deleted)
famzah@vbox64:~$ pmap -XX 9257
9257: ./a.out
Address Perm Offset Device Inode Size KernelPageSize MMUPageSize Rss Pss Shared_Clean Shared_Dirty Private_Clean Private_Dirty Referenced Anonymous LazyFree AnonHugePages ShmemPmdMapped Shared_Hugetlb Private_Hugetlb Swap SwapPss Locked VmFlagsMapping
7f052ea9b000 rw-s 00000000 00:05 733825 2048000 4 4 512000 256000 0 512000 0 0 512000 0 0 0 0 0 0 0 0 0 rd wr sh mr mw me ms sd zero (deleted)
famzah@vbox64:~$ pmap -XX 9258
9258: ./a.out
Address Perm Offset Device Inode Size KernelPageSize MMUPageSize Rss Pss Shared_Clean Shared_Dirty Private_Clean Private_Dirty Referenced Anonymous LazyFree AnonHugePages ShmemPmdMapped Shared_Hugetlb Private_Hugetlb Swap SwapPss Locked VmFlagsMapping
7f052ea9b000 rw-s 00000000 00:05 733825 2048000 4 4 1024000 768000 0 512000 512000 0 1024000 0 0 0 0 0 0 0 0 0 rd wr sh mr mw me ms sd zero (deleted)
view raw pmap output hosted with ❤ by GitHub

Things get even more messy when the child processes terminate (and get replaced by new ones which never touched the shared anonymous memory). Here is how the process list looks like:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
famzah 9256 0.0 0.0 2052508 816 pts/15 S+ 18:00 0:00 ./a.out # parent
famzah@vbox64:~$ free -m
total used free shared buff/cache available
Mem: 4940 549 1943 1012 2447 3097
view raw ps output hosted with ❤ by GitHub

The RSS (resident set size, or simply “the real usage”) of the parent process continues to show no usage. But the anonymous memory region is actually used because the child processes wrote data in it. And the region is not automatically free()’d, because the parent process is still alive. The “free -m” command clearly shows that there are 1000 MB of data stored in anonymous shared memory.

How can we reliably find out the memory usage of the anonymous shared region and account it to a given process?

We will look at /proc/[pid]/maps:

A file containing the currently mapped memory regions and their access permissions. See mmap(2) for some further information about memory mappings.

If the pathname field is blank, this is an anonymous mapping as obtained via mmap(2). There is no easy way to coordinate this back to a process’s source, short of running it through gdb(1), strace(1), or similar.

Wikipedia gives the following additional information:

When “/dev/zero” is memory-mapped, e.g., with mmap(), to the virtual address space, it is equivalent to using anonymous memory; i.e. memory not connected to any file.

Now we know how to find out the virtual address of the anonymously memory-mapped region. Here I demostrate two different ways of obtaining the address:

famzah@vbox64:~$ cat /proc/9256/maps | grep /dev/zero
7f052ea9b000-7f05aba9b000 rw-s 00000000 00:05 733825 /dev/zero (deleted)
famzah@vbox64:~$ ls -la /proc/9256/map_files/ | grep /dev/zero # same region of 7f052ea9b000-7f05aba9b000
lrw——- 1 famzah famzah 64 Nov 11 18:21 7f052ea9b000-7f05aba9b000 -> '/dev/zero (deleted)'

The man page of tmpfs gives further insight:

An internal shared memory filesystem is used for […] shared anonymous mappings (mmap(2) with the MAP_SHARED and MAP_ANONYMOUS flags).

The amount of memory consumed by all tmpfs filesystems is shown in the Shmem field of /proc/meminfo and in the shared field displayed by free(1).

We verify that the memory-mapped region is a “tmpfs” file:

famzah@vbox64:~$ sudo stat -Lf /proc/9256/map_files/7f052ea9b000-7f05aba9b000
File: "/proc/9256/map_files/7f052ea9b000-7f05aba9b000"
ID: 0 Namelen: 255 Type: tmpfs

💚 We can then finally get the real memory usage of this shared anonymous memory block in terms of VSS (virtual memory size) and RSS (resident set size, or simply “the real usage”):

# stat doesn't give us the real usage, only the virtual
famzah@vbox64:~$ sudo stat -L /proc/9256/map_files/7f052ea9b000-7f05aba9b000
File: /proc/9256/map_files/7f052ea9b000-7f05aba9b000
Size: 2097152000 Blocks: 2048000 IO Block: 4096 regular file
Device: 5h/5d Inode: 733825 Links: 0
Access: (0777/-rwxrwxrwx) Uid: ( 1000/ famzah) Gid: ( 1000/ famzah)
# VSS (virtual memory size)
famzah@vbox64:~$ sudo du –apparent-size -BM -D /proc/9256/map_files/7f052ea9b000-7f05aba9b000
2000M /proc/9256/map_files/7f052ea9b000-7f05aba9b000
# RSS (resident set size, or simply "the real usage")
famzah@vbox64:~$ sudo du -BM -D /proc/9256/map_files/7f052ea9b000-7f05aba9b000
1000M /proc/9256/map_files/7f052ea9b000-7f05aba9b000

Since we have access to the memory region as a file, we can even read this memory mapped region:

famzah@vbox64:~$ sudo cat /proc/9256/map_files/7f052ea9b000-7f05aba9b000 | wc -c
2097152000