/contrib/famzah

Enthusiasm never stops


2 Comments

fork() gets slower as parent process uses more memory

Background information
Forking is an important, fundamental part of Unix, critical to the support of its design philosophy. For example, if a process wants to execute a command, it has to fork() a child process which then immediately calls exec(). And since the philosophy of Unix involves executing many small commands/programs, in order to achieve something meaningful, it turns out that fork() is called pretty often.

There are two main fork() patterns when a parent process wants to execute a command:

  • The parent does not need to communicate with the child process in any way – the child process executes, and the parent gets its exit code back. No input/output with the child is done at all, only the inital command line arguments are passed.
  • The parent needs to communicate with the child process – either it needs to supply something at the standard input or some other file descriptor of the child, or it wants to get some information from the standard output or some other file descriptor of the child, or both.

For the case when there is no communication involved, Unix guys developed the vfork() call. It is a very light-weight version of fork(), very close to threading. The gotcha here is that a child process which was created by vfork() cannot modify any variables in its space at all or do something else, because it has no own stack. The child process is only allowed to call exec() right after it was born, nothing more. This speeds up the usual fork()-then-exec() model, because often the parent does not need to communicate with the child process – the parent just wants the command executed with the given command line arguments.

For all other cases when the parent communicates with the child internally using file descriptors (anonymous pipes, etc.), the standard fork() system call is used.

Problem definition
It turns out that when the parent process allocates some memory, the fork() call takes longer to execute if a bigger amount of this allocated memory is being used, that is – if the parent process writes something there. Linux and probably the other Unix systems employ a copy-on-write feature and don’t physically copy the allocated memory from the parent into the child process initially on each fork(). Not until the child modifies it. Nevertheless, the fork() call gets slower and slower as more and more memory is being used (not just allocated) in the parent process. It seems that even though the data of the allocated/used memory itself is not being copied, thanks to the copy-on-write feature, the internal virtual memory structures in the kernel, which hold the information about how much and what memory the parent process has allocated, are being copied in an inefficient way while the child process is being created by fork().

Currently available options
So why don’t we then just vfork() always? It is very fast. And the answer is – because we cannot communicate with the child process when it was created by vfork().

Okay, so why don’t we use threads then? They are similar to vfork(), only that the child process (thread) has its own stack and shares the data segment (allocated memory) of the parent process. We can even use these shared data variables for inter-process communication. And the answer is – because a thread cannot invoke exec() by definition. This is not supported by the threading libraries, as required by POSIX.1.

Talk to me in numbers, how slower does fork() get in regards to memory allocation and usage
Here are some benchmark results, the forking is done a few thousand times, in order to accumulate an accountable CPU time. The program which is being exec()’ed is a very tiny binary which contains only two system calls – write(“Hello world”) and then _exit(0). The benchmark results follow:

System info Allocated memory Usage ratio vfork() + exec() fork() + exec()
Linux 2.6.28-15-generic i686 20MB 1:2 (10MB) 1.49 12.08
Linux 2.6.28-15-generic i686 20MB 1:1 (20MB) 1.53 21.60
Linux 2.6.28-15-generic i686 40MB 1:2 (20MB) 1.59 21.23
FreeBSD 7.1-RELEASE-p4 i386 20MB 1:2 (10MB) 2.26 20.22
FreeBSD 7.1-RELEASE-p4 i386 40MB 1:2 (20MB) 2.44 33.94

As we can see from the test results, the vfork() call is not affected by the amount of memory usage. This does not apply for fork() though. On Linux we observe almost two times more CPU usage as the memory usage is increased twice. On FreeBSD the results are similar, only a bit better – if the memory usage is increased twice, the CPU usage of fork() is increased with 50% (vs. 100% on Linux). Even though, the difference in CPU time between the vfork() and fork() calls is significant on both operating systems – fork() is more than 1000% slower.

You can read my next article which describes a solution for Linux which allows a parent process to communicate with its child, similar to fork(), but is as fast as vfork(). The article also contains more detailed information about the benchmark tests we did, in order to populate the above table.