Using flock() in Bash without invoking a subshell

July 31, 2013 by Ivan Zahariev 13 Comments

The flock(1) utility on Linux manages flock(2) advisory locks from within shell scripts or the command line. This lets you synchronize your Bash scripts with all your other applications written in Perl, Python, C, etc.

I’ll focus on the third usage form where flock() is used inside a Bash script. Here is what the man page suggests:

#!/bin/bash

(
flock -s 200

# ... commands executed under lock ...

) 200>/var/lock/mylockfile

Unfortunately, this invokes a subshell which has the following drawbacks:

You cannot pass values to variables from the subshell in the main shell script.
There is a performance penalty.
The syntax coloring in “vim” does not work properly. 🙂

This motivated my colleague zImage to come up with a usage form which does not invoke a subshell in Bash:

#!/bin/bash

exec {lock_fd}>/var/lock/mylockfile || exit 1
flock -n "$lock_fd" || { echo "ERROR: flock() failed." >&2; exit 1; }

# ... commands executed under lock ...

flock -u "$lock_fd"

Note that you can skip the “flock -u “$lock_fd” unlock command if it is at the very end of your script. In such a case, your lock file will be unlocked once your process terminates.

Author: Ivan Zahariev

An experienced Linux & IT enthusiast, Engineer by heart, Systems architect & developer.

13 thoughts on “Using flock() in Bash without invoking a subshell”

Leave a comment

Al_
April 2, 2015 at 5:53 pm

Thanks for this optional usage of flock. What happens if the script crashes and the line ‘flock -u 200’ is never reached? Specifically: will the lock still be released, as it should?

Reply
- Ivan Zahariev
  April 2, 2015 at 7:42 pm
  
  The lock will be released upon script termination or server reboot. You can actually skip the line “flock -u 200″ at the very end of the script, as it’s redundant.
  
  Reply
required
September 25, 2015 at 8:13 pm

I tried this – but it doesn’t actually create a lock as far as I can tell. Here’s how I tested it.

1. open two bash shell windows

2. Execute in the first bash shell
exec 200>/var/lock/mylockfile
then
flock -n 200

3. Do the exact same thing in the second bash shell checking exit codes.

The first shell does not stop the lock of the second shell and the second shell commands return 0 (success) at each step.

Am I missing something here?

Reply
- required
  September 28, 2015 at 3:13 am
  
  Disregard earlier comment. flock -n 200 returns 1 in second bash shell as expected. Must have fat-fingered it.
  
  Reply
Sam Liddicott
January 11, 2016 at 2:21 pm

Your lock fd 200 is going to be inherited by child processes, which, if long lived, will prevent the lock from being released. e.g. restarting a daemon… sadly bash doesn’t allow setting fd to CLOEXEC.

On newer bash, consider:

exec {lock_fd} > /var/lock/mylockfile

then bash will find a spare fd and assign it to $lock_fd

Reply
- Ivan Zahariev
  January 27, 2016 at 10:19 pm
  
  Great hint! I didn’t know about this Bash feature. Note that there must be no space after the {lock_fd}, or else the magic won’t work. A little bit more info regarding this matter: http://stackoverflow.com/questions/8297415/in-bash-how-to-find-the-lowest-numbered-unused-file-descriptor
  
  I’ve updated my example following your suggestion. Thanks.
  
  Reply
  - Sam Liddicott
    February 1, 2016 at 3:22 pm
    
    Another improvement for you, to prevent the lock being inherited by child processes:
    
    #!/bin/bash
    
    exec {lock_fd}>/var/lock/mylockfile || exit 1
    flock -n “$lock_fd” || { echo “ERROR: flock() failed.” >&2; exit 1; }
    {
    
    # … commands executed under lock …
    
    } {lock_fd}>&-
    
    flock -u “$lock_fd”
    
    It “closes” {lock_fd} for the code inside the braces — of course i doesn’t really close it, it first dup’s {lock_fd} to one which it will close before exec, and then closes {lock_fd}. Once the code in { … } competes, then this spare fd is dup;d back to {lock_fd} and the spare fd is then closed.
  - Sam Liddicott
    February 1, 2016 at 3:29 pm
    
    Maybe this bash wrap of “flock -o -c …” will be interesting to you:
    
    It apparently supports flock -o -c … to call a bash function of the same script, same pid, with no forking, in the same way that flock -c calls other external commands.
    
    Thus your example becomes:
    
    something() {
    # … commands executed under lock …
    …
    }
    flock -o /var/lock/mylockfile -c do_something || { echo “ERROR: flock() failed.” >&2; exit 1; }
    
    There is no need to worry about calling flock -n, or hiding the lock pid from child processes.
    
    Because the command is called while the flock function wrapper is active, it uses some tricks to avoid trampling on BASH variables or declaring local variables that would affect final command.
    
    http://blog.sam.liddicott.com/2016/02/using-flock-in-bash-without-invoking.html
  - Ivan Zahariev
    February 4, 2016 at 8:47 pm
    
    Thanks for sharing.
Alex W
October 28, 2018 at 1:01 pm

This is not 100% proof since you still have a race condition at `exec {lock_fd}>/var/lock/mylockfile`. The man page for flock has been showing the proper way of doing it without a subshell for decades:

{
flock -n 19 || { echo can’t acquire lock; exit 1; }
# your commands here
} 19>/run/my.lock

There is another way, also shown int he man page, where you can do a one liner at the start of your script:

[[ $FLOCK != “$0” ]] && exec env FLOCK=$0 flock -ne “$0” “$0” “$@” || :

Reply
- Ivan Zahariev
  October 29, 2018 at 10:53 pm
  
  Hi Alex, where exactly do you see a possible race condition?
  
  Let’s review the following Bash script:
```
#!/bin/bash
 
exec {lock_fd}>/var/lock/mylockfile || exit 1
flock -n "$lock_fd" || { echo "ERROR: flock() failed." >&2; exit 1; }
 
# ... commands executed under lock ...

sleep 600
```
  Running this via “strace” shows the following system calls:
```
openat(AT_FDCWD, "/var/lock/mylockfile", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
fcntl(3, F_DUPFD, 10)                   = 10
close(3)                                = 0

clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f07d0323490) = 2461
[pid  2461] execve("/usr/bin/flock", ["flock", "-n", "10"], 0x56215a29c150 /* 77 vars */) = 0
[pid  2461] flock(10, LOCK_EX|LOCK_NB)  = 0
[pid  2461] exit_group(0)               = ?
[pid  2461] +++ exited with 0 +++

clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f07d0323490) = 2462
[pid  2462] execve("/bin/sleep", ["sleep", "600"], 0x56215a29c150 /* 77 vars */) = 0
[pid  2462] nanosleep({tv_sec=600, tv_nsec=0}, ^C <unfinished ...>
```
  This is a classical and 100% proof way to obtain a file lock.
  
  Note that running “flock” via fork() is not a problem. The flock() man page says the following:
  
  Locks created by flock() are associated with an open file description (see open(2)). This means that duplicate file descriptors (created by, for example, fork(2) or dup(2)) refer to the same lock, and this lock may be modified or released using any of these descriptors.
  
  Please let me know if I misunderstood where the possible problem is.
  
  Reply
Thomas Swan
April 9, 2019 at 6:32 am

The solution was almost correct. flock does not use the fd from the parent shell; therefore, it must be duped flock in the command. Without the dupe, you should get bad file descriptor warnings from flock. The following works with most sh flavors: ksh, bash, etc. I use flock 1 1>&9 to show the difference between the file descriptors within the execution. It would work as well with flock -n 9 9>&9 for ease of tracing. In the following examples, use to shell session to run the scripts simultaneously or with different offsets in start/stop. bash can be substituted for ksh.
```
#!/bin/ksh -x
LOCKFILE=/tmp/lock
exec 9>>$LOCKFILE
# example, exit immediately if lock is taken
flock -x -n 1 1>&9 || {
  ERR=$?
  echo lock taken by $(cat $LOCKFILE)
  exit $ERR
}
echo $$ >&9
sleep 30
exit 0
```
Another variant with flock and waiting
```
#!/bin/ksh -x
LOCKFILE=/tmp/lock
exec 9>>$LOCKFILE
flock -x -w 10 1 1>&9 || {
  ERR=$?
  echo lock taken by $(cat $LOCKFILE) wait exceeded 10s
  exit $ERR
}
echo $$ >&9
sleep 30
exit 0
```
Reply
- Ivan Zahariev
  April 9, 2019 at 10:15 am
  
  Hi, Thomas,
  
  If you say that my implementation doesn’t work in Bash (I don’t know about “ksh”), please support your claim with a proof-of-concept example.
  
  Additionally, the man page of execve() states that “By default, file descriptors remain open across an execve()”. I don’t see why we need to dup() anything in such a case.
  
  Does “ksh” support the syntax “exec {lock_fd}>/var/lock/mylockfile”? The magic here is that the next free file descriptor is opened for the file and then assigned to the variable “lock_fd”.
  
  Reply