faster fork exec system popen

# the same as system() but using posix_spawn() which is 12 times faster void posix_spawn_test() { pid_t pid; char * const argv[] = { "./tiny2" , NULL }; if (posix_spawn(&pid, "./tiny2", NULL, NULL, argv, environ) != 0) { err(EXIT_FAILURE, "posix_spawn()"); } parent_waitpid(pid); }

This project is now hosted on GitHub: https://github.com/famzah/popen-noshell

Problem definition
As we already discussed it, fork() is slow. What do we do if we want to make many popen() calls and still spend less money on hardware?

The parent process calling the popen() function communicates with the child process by reading its standard output. Therefore, we cannot use vfork() to speed things up, because it doesn’t allow the child process to close its standard output and duplicate the passed file descriptors from the parent to its standard output before exec()’uting the command. A child process created by vfork() can only call exec() right away, nothing more.

If we used threads to re-implement popen(), because the creation of a thread is very light-weight, we couldn’t then use exec(), because invoking exec() from a thread terminates the execution of all other threads, including the parent one.

Problem resolution
We need a fork mechanism which is similar to threads and vfork() but still allows us to execute commands other than just exec().

The system call clone() comes to the rescue. Using clone() we create a child process which has the following features:

The child runs in the same memory space as the parent. This means that no memory structures are copied when the child process is created. As a result of this, any change to any non-stack variable made by the child is visible by the parent process. This is similar to threads, and therefore completely different from fork(), and also very dangerous – we don’t want the child to mess up the parent.
The child starts from an entry function which is being called right after the child was created. This is like threads, and unlike fork().
The child has a separate stack space which is similar to threads and fork(), but entirely different to vfork().
The most important: This thread-like child process can call exec().

In a nutshell, by calling clone in the following way, we create a child process which is very similar to a thread but still can call exec():

pid = clone(fn, stack_aligned, CLONE_VM | SIGCHLD, arg);

The child starts at the function fn(arg). We have allocated some memory for the stack which must be aligned. There are some important notes (valid at the time being) which I learned by reading the source of libc and the Linux kernel:

On all supported Linux platforms the stack grows down, except for HP-PARISC. You can grep the kernel source for “STACK_GROWSUP”, in order to get this information.
On all supported platforms by GNU libc, the stack is aligned to 16 bytes, except for the SuperH platform which is aligned to 8 bytes. You can grep the glibc source for “STACK_ALIGN”, in order to get this information.

Note that this trick is tested only on Linux. I failed to make it work on FreeBSD.

Usage
Once we have this child process created, we carefully watch not to touch any global variables of the parent process, do some file descriptor magic, in order to be able to bind the standard output of the child process to a file descriptor at the parent, and execute the given command with its arguments.

You will find detailed examples and use-cases in the source code. A very simplified example follows with no error checks:

fp = popen_noshell("ls", (const char * const *)argv, "r", &pclose_arg, 0); while (fgets(buf, sizeof(buf)-1, fp)) { printf("Got line: %s", buf); } status = pclose_noshell(&pclose_arg);

There is a more compatible version of popen_noshell() which accepts the command and its arguments as one whole string, but its usage is discouraged, because it tries to very naively emulate simple shell arguments interpretation.

Benchmark results
I’ve done several tests on how fast is popen_noshell() compared to popen() and even a bare fork()+exec(). All the results are similar and therefore I’m publishing only one of the benchmark results:

Here are the resources which you can download:

All benchmark tests which I did. (42K)

I will appreciate any comments on the library.

Test	Uses pipes	User CPU	System CPU	Total CPU	Slower with
vfork() + exec(), standard Libc	No	7.4	1.6	9.0	–
the new noshell, default clone(), compat=1	Yes	7.7	2.1	9.7	8%
the new noshell, default clone(), compat=0	Yes	7.8	2.0	9.9	9%
posix_spawn() + exec() no pipes, standard Libc	No	9.4	2.0	11.5	27%
the new noshell, posix_spawn(), compat=0	Yes	9.6	2.7	12.3	36%
the new noshell, posix_spawn(), compat=1	Yes	9.6	2.7	12.3	37%
fork() + exec(), standard Libc	No	40.5	43.8	84.3	836%
the new noshell, debug fork(), compat=1	No	41.6	45.2	86.8	863%
the new noshell, debug fork(), compat=0	No	41.6	45.3	86.9	865%
system(), standard Libc	No	67.3	48.1	115.4	1180%
popen(), standard Libc	Yes	70.4	47.1	117.5	1204%

/contrib/famzah

Enthusiasm never stops

Tag Archives: faster fork exec system popen

posix_spawn() performance benchmarks and usage examples

A much faster popen() and system() implementation for Linux