Enthusiasm never stops

A much faster popen() and system() implementation for Linux


This project is now hosted on GitHub: https://github.com/famzah/popen-noshell

Problem definition
As we already discussed it, fork() is slow. What do we do if we want to make many popen() calls and still spend less money on hardware?

The parent process calling the popen() function communicates with the child process by reading its standard output. Therefore, we cannot use vfork() to speed things up, because it doesn’t allow the child process to close its standard output and duplicate the passed file descriptors from the parent to its standard output before exec()’uting the command. A child process created by vfork() can only call exec() right away, nothing more.

If we used threads to re-implement popen(), because the creation of a thread is very light-weight, we couldn’t then use exec(), because invoking exec() from a thread terminates the execution of all other threads, including the parent one.

Problem resolution
We need a fork mechanism which is similar to threads and vfork() but still allows us to execute commands other than just exec().

The system call clone() comes to the rescue. Using clone() we create a child process which has the following features:

  • The child runs in the same memory space as the parent. This means that no memory structures are copied when the child process is created. As a result of this, any change to any non-stack variable made by the child is visible by the parent process. This is similar to threads, and therefore completely different from fork(), and also very dangerous – we don’t want the child to mess up the parent.
  • The child starts from an entry function which is being called right after the child was created. This is like threads, and unlike fork().
  • The child has a separate stack space which is similar to threads and fork(), but entirely different to vfork().
  • The most important: This thread-like child process can call exec().

In a nutshell, by calling clone in the following way, we create a child process which is very similar to a thread but still can call exec():

pid = clone(fn, stack_aligned, CLONE_VM | SIGCHLD, arg);

The child starts at the function fn(arg). We have allocated some memory for the stack which must be aligned. There are some important notes (valid at the time being) which I learned by reading the source of libc and the Linux kernel:

  • On all supported Linux platforms the stack grows down, except for HP-PARISC. You can grep the kernel source for “STACK_GROWSUP”, in order to get this information.
  • On all supported platforms by GNU libc, the stack is aligned to 16 bytes, except for the SuperH platform which is aligned to 8 bytes. You can grep the glibc source for “STACK_ALIGN”, in order to get this information.

Note that this trick is tested only on Linux. I failed to make it work on FreeBSD.

Once we have this child process created, we carefully watch not to touch any global variables of the parent process, do some file descriptor magic, in order to be able to bind the standard output of the child process to a file descriptor at the parent, and execute the given command with its arguments.

You will find detailed examples and use-cases in the source code. A very simplified example follows with no error checks:

fp = popen_noshell("ls", (const char * const *)argv, "r", &pclose_arg, 0);
while (fgets(buf, sizeof(buf)-1, fp)) {
    printf("Got line: %s", buf);
status = pclose_noshell(&pclose_arg);

There is a more compatible version of popen_noshell() which accepts the command and its arguments as one whole string, but its usage is discouraged, because it tries to very naively emulate simple shell arguments interpretation.

Benchmark results
I’ve done several tests on how fast is popen_noshell() compared to popen() and even a bare fork()+exec(). All the results are similar and therefore I’m publishing only one of the benchmark results:
Tested functions on Linux - popen_noshell(), fork(), vfork(), popen(), system()

Here are the resources which you can download:

I will appreciate any comments on the library.

Author: Ivan Zahariev

An experienced Linux & IT enthusiast, Engineer by heart, Systems architect & developer.

11 thoughts on “A much faster popen() and system() implementation for Linux

  1. Thanks for this interesting article. One additional interesting technic: the posix_spawn() (http://www.opengroup.org/onlinepubs/009695399/functions/posix_spawn.html) function is supposed to be a modern replacement to the old fork()/exec(). The performances, as far as I tested, are equivalent to the clone()/exec() one. [Unfortunately the API behind is a bit broken, and is not really allowing to close all opened process handles, which is just a shame.]

  2. This is great stuff. I wonder, have you ever considered implementing a subset of posix_spawn() instead of, or in addition to, popen() and system()? You should be able to implement popen() and system() using the equivalent of only posix_spawn_file_actions_addclose(), posix_spawn_file_actions_addopen(), and posix_spawn_file_actions_adddup2(), and would have a significantly more versatile interface to boot.

    • I have looked into the posix_spawn() method, but it didn’t seem very well documented at that time, nor stable. In the previous user comment, Xavier Roche reports that the API is also a bit broken.

  3. Nice articles!
    I mentioned your article in this stackoverflow topic

    Another drawback of fork():
    Even if it doesn’t duplicate memory in the end thanks to COW, fork() still fails if there is not enough memory to duplicate the memory used by your parent process.
    It can be quite annoying …

    I looked at your code, and it looks good in general!
    I’ll still stick with posix_spawn() for production code, but your code was inspiring!

    You could try to write shorter lines (less than 80 chars is ideal :),
    and IMHO you should replace exit() by _exit() in void _popen_noshell_child_process().
    Otherwise you may run into troubles esp. with c++ code, see http://www.unixguide.net/unix/programming/1.1.3.shtml

  4. am wondering if it is allowed to call setuid() or setgid() before calling any variant of exec(). This obviously is allowed when we do fork(). Is it allowed for clone() too?

  5. I just wanted to thank Ivan for developing this, because it’s saved my neck on finishing a research project before a deadline. Not only has this saved me a a serious amount of development time, but it’s reduced “system” overhead from 98% down to 15%.

    • Thank you for your appreciation. I’m really glad that the library was useful for an educational research. While I was in the university I didn’t have enough time to contribute to the community, but it’s never too late. 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s