fork | /contrib/famzah

Many years ago I wrote the library popen_noshell which improves the speed of the popen() call significantly. It seems that now there is a standard and very efficient way to achieve the same. Use the posix_spawn() call. Its interface is a bit grumpy and complicated, but it can’t be very simple after all, because posix_spawn() provides both great efficiency and lots of flexibility.

UPDATE: Here are some benchmarks for posix_spawn().

Let us first examine the different ways of spawning a process on Linux 4.10. Here are the different implementations of the following functions:

fork(): _do_fork(SIGCHLD, 0, 0, NULL, NULL, 0);
vfork(): _do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, 0, 0, NULL, NULL, 0);
clone(): _do_fork(clone_flags, newsp, 0, parent_tidptr, child_tidptr, tls);
posix_spawn(): implemented by using clone(); no native Linux kernel syscall, yet

In the latest versions of the GNU libc, posix_spawn() uses a clone() call which is equivalent to the vfork() arguments of clone(). Therefore, a logical question pops up – why not use vfork() directly. “The problem are the atfork handlers which can be registered. In the child process they can modify the address space.”

Of course, it would be best if posix_spawn() was implemented as a system call in the Linux kernel. Then we wouldn’t need to depend on the GNU libc implementations, which by the way differ with the different versions of glibc. Additionally, the Linux kernel could spawn processes even faster.

The current implementation of posix_spawn() in the GNU libc is basically a vfork() with a limited, safe set of functions which can be executed inside the vfork()’ed child. When using vfork(), the child shares the memory and the stack of the parent process, so we need to be extra careful indeed. There are plenty of warnings in the man pages about the usage of vfork().

I am glad that my implementation and this of the GNU libc guys is very similar. They did a better job though, because they handle a few corner cases like custom signal handlers in the parent, etc. It’s worth to review the comments and the source code of the patch which introduces the new, very efficient posix_spawn() implementation in the GNU libc.

The above patch got into mainstream with glibc 2.24 on 2016-08-05.

When glibc 2.24 gets into the most mainstream Linux distributions, we can start to use posix_spawn() which should be as efficient as my popen_noshell implementation.

P.S. If you want to read even more technical details about the *fork() calls, try this and this pages.

LINE="col0 col1 col2 col3 col4 " COLS=() for val in $LINE ; do COLS+=("$val") done echo "${COLS[0]}"; # prints "col0" echo "${COLS[1]}"; # prints "col1" echo "${COLS[2]}"; # prints "col2" echo "${COLS[3]}"; # prints "col3" echo "${COLS[4]}"; # prints "col4"

# # OLD CODE # Update: Aug/2016: I've encountered a bug in Bash where this splitting doesn't work as expected! Please see the comments below. # # Here is the effective solution which I found with my colleagues at work: COLS=( $LINE ); # parses columns without executing a subshell RESULT="${COLS[0]}"; # returns first column (0-based indexes) # Here is an example: LINE="col0 col1 col2 col3 col4 " # white-space including tab chars COLS=( $LINE ); # parses columns without executing a subshell echo "${COLS[0]}"; # prints "col0" echo "${COLS[1]}"; # prints "col1" echo "${COLS[2]}"; # prints "col2" echo "${COLS[3]}"; # prints "col3" echo "${COLS[4]}"; # prints "col4"

/contrib/famzah

Enthusiasm never stops

Tag Archives: fork

posix_spawn() on Linux

Bash: Split a string into columns by white-space without invoking a subshell