/contrib/famzah

Enthusiasm never stops

Using flock() in Bash without invoking a subshell

13 Comments

The flock(1) utility on Linux manages flock(2) advisory locks from within shell scripts or the command line. This lets you synchronize your Bash scripts with all your other applications written in Perl, Python, C, etc.

I’ll focus on the third usage form where flock() is used inside a Bash script. Here is what the man page suggests:

#!/bin/bash

(
flock -s 200

# ... commands executed under lock ...

) 200>/var/lock/mylockfile

Unfortunately, this invokes a subshell which has the following drawbacks:

  • You cannot pass values to variables from the subshell in the main shell script.
  • There is a performance penalty.
  • The syntax coloring in “vim” does not work properly. 🙂

This motivated my colleague zImage to come up with a usage form which does not invoke a subshell in Bash:

#!/bin/bash

exec {lock_fd}>/var/lock/mylockfile || exit 1
flock -n "$lock_fd" || { echo "ERROR: flock() failed." >&2; exit 1; }

# ... commands executed under lock ...

flock -u "$lock_fd"

Note that you can skip the “flock -u “$lock_fd” unlock command if it is at the very end of your script. In such a case, your lock file will be unlocked once your process terminates.

Author: Ivan Zahariev

An experienced Linux & IT enthusiast, Engineer by heart, Systems architect & developer.

13 thoughts on “Using flock() in Bash without invoking a subshell

  1. Thanks for this optional usage of flock. What happens if the script crashes and the line ‘flock -u 200’ is never reached? Specifically: will the lock still be released, as it should?

    • The lock will be released upon script termination or server reboot. You can actually skip the line “flock -u 200″ at the very end of the script, as it’s redundant.

  2. I tried this – but it doesn’t actually create a lock as far as I can tell. Here’s how I tested it.

    1. open two bash shell windows

    2. Execute in the first bash shell
    exec 200>/var/lock/mylockfile
    then
    flock -n 200

    3. Do the exact same thing in the second bash shell checking exit codes.

    The first shell does not stop the lock of the second shell and the second shell commands return 0 (success) at each step.

    Am I missing something here?

  3. Your lock fd 200 is going to be inherited by child processes, which, if long lived, will prevent the lock from being released. e.g. restarting a daemon… sadly bash doesn’t allow setting fd to CLOEXEC.

    On newer bash, consider:

    exec {lock_fd} > /var/lock/mylockfile

    then bash will find a spare fd and assign it to $lock_fd

    • Great hint! I didn’t know about this Bash feature. Note that there must be no space after the {lock_fd}, or else the magic won’t work. A little bit more info regarding this matter: http://stackoverflow.com/questions/8297415/in-bash-how-to-find-the-lowest-numbered-unused-file-descriptor

      I’ve updated my example following your suggestion. Thanks.

      • Another improvement for you, to prevent the lock being inherited by child processes:

        #!/bin/bash

        exec {lock_fd}>/var/lock/mylockfile || exit 1
        flock -n “$lock_fd” || { echo “ERROR: flock() failed.” >&2; exit 1; }
        {

        # … commands executed under lock …

        } {lock_fd}>&-

        flock -u “$lock_fd”

        It “closes” {lock_fd} for the code inside the braces — of course i doesn’t really close it, it first dup’s {lock_fd} to one which it will close before exec, and then closes {lock_fd}. Once the code in { … } competes, then this spare fd is dup;d back to {lock_fd} and the spare fd is then closed.

      • Maybe this bash wrap of “flock -o -c …” will be interesting to you:

        It apparently supports flock -o -c … to call a bash function of the same script, same pid, with no forking, in the same way that flock -c calls other external commands.

        Thus your example becomes:

        something() {
        # … commands executed under lock …

        }
        flock -o /var/lock/mylockfile -c do_something || { echo “ERROR: flock() failed.” >&2; exit 1; }

        There is no need to worry about calling flock -n, or hiding the lock pid from child processes.

        Because the command is called while the flock function wrapper is active, it uses some tricks to avoid trampling on BASH variables or declaring local variables that would affect final command.

        http://blog.sam.liddicott.com/2016/02/using-flock-in-bash-without-invoking.html

  4. This is not 100% proof since you still have a race condition at `exec {lock_fd}>/var/lock/mylockfile`. The man page for flock has been showing the proper way of doing it without a subshell for decades:

    {
    flock -n 19 || { echo can’t acquire lock; exit 1; }
    # your commands here
    } 19>/run/my.lock

    There is another way, also shown int he man page, where you can do a one liner at the start of your script:

    [[ $FLOCK != “$0” ]] && exec env FLOCK=$0 flock -ne “$0” “$0” “$@” || :

    • Hi Alex, where exactly do you see a possible race condition?

      Let’s review the following Bash script:

      #!/bin/bash
       
      exec {lock_fd}>/var/lock/mylockfile || exit 1
      flock -n "$lock_fd" || { echo "ERROR: flock() failed." >&2; exit 1; }
       
      # ... commands executed under lock ...
      
      sleep 600
      

      Running this via “strace” shows the following system calls:

      openat(AT_FDCWD, "/var/lock/mylockfile", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
      fcntl(3, F_DUPFD, 10)                   = 10
      close(3)                                = 0
      
      clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f07d0323490) = 2461
      [pid  2461] execve("/usr/bin/flock", ["flock", "-n", "10"], 0x56215a29c150 /* 77 vars */) = 0
      [pid  2461] flock(10, LOCK_EX|LOCK_NB)  = 0
      [pid  2461] exit_group(0)               = ?
      [pid  2461] +++ exited with 0 +++
      
      clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f07d0323490) = 2462
      [pid  2462] execve("/bin/sleep", ["sleep", "600"], 0x56215a29c150 /* 77 vars */) = 0
      [pid  2462] nanosleep({tv_sec=600, tv_nsec=0}, ^C <unfinished ...>
      

      This is a classical and 100% proof way to obtain a file lock.

      Note that running “flock” via fork() is not a problem. The flock() man page says the following:

      Locks created by flock() are associated with an open file description (see open(2)). This means that duplicate file descriptors (created by, for example, fork(2) or dup(2)) refer to the same lock, and this lock may be modified or released using any of these descriptors.

      Please let me know if I misunderstood where the possible problem is.

  5. The solution was almost correct. flock does not use the fd from the parent shell; therefore, it must be duped flock in the command. Without the dupe, you should get bad file descriptor warnings from flock. The following works with most sh flavors: ksh, bash, etc. I use flock 1 1>&9 to show the difference between the file descriptors within the execution. It would work as well with flock -n 9 9>&9 for ease of tracing. In the following examples, use to shell session to run the scripts simultaneously or with different offsets in start/stop. bash can be substituted for ksh.

    #!/bin/ksh -x
    LOCKFILE=/tmp/lock
    exec 9>>$LOCKFILE
    # example, exit immediately if lock is taken
    flock -x -n 1 1>&9 || {
      ERR=$?
      echo lock taken by $(cat $LOCKFILE)
      exit $ERR
    }
    echo $$ >&9
    sleep 30
    exit 0
    

    Another variant with flock and waiting

    #!/bin/ksh -x
    LOCKFILE=/tmp/lock
    exec 9>>$LOCKFILE
    flock -x -w 10 1 1>&9 || {
      ERR=$?
      echo lock taken by $(cat $LOCKFILE) wait exceeded 10s
      exit $ERR
    }
    echo $$ >&9
    sleep 30
    exit 0
    
    • Hi, Thomas,

      If you say that my implementation doesn’t work in Bash (I don’t know about “ksh”), please support your claim with a proof-of-concept example.

      Additionally, the man page of execve() states that “By default, file descriptors remain open across an execve()”. I don’t see why we need to dup() anything in such a case.

      Does “ksh” support the syntax “exec {lock_fd}>/var/lock/mylockfile”? The magic here is that the next free file descriptor is opened for the file and then assigned to the variable “lock_fd”.

Leave a comment