Enthusiasm never stops

Leave a comment

“iperf” and “iftop” accuracy

While working on my latest pet project which involved 10 GigE transfers, I noticed a significant difference between the results shown by “iperf” and “iftop“. A fellow blogger also noticed this discrepancy. In order to get to the bottom of this, I did some additional tests using different MTU sizes, and observing the output of “iperf”, “iftop”, “iptraf”, and the raw Linux network device counters as seen by “ifconfig”.

The tests results are summarized in an online spreadsheet: https://goo.gl/MvJC8K
iperf vs. iftop vs. iptraf vs. raw stats - spreadsheet - preview

Some notes about each application:

  • iperf – this tool measures the TCP performance, as per documentation; therefore it counts the useful payload in a TCP/IP transfer; this is layer4 in the OSI model
  • iftop – this tool counts all IP packets, as per documentation; my tests show that it also operates on layer4, just as “iperf”, because ARP traffic (on layer3) is not counted at all; the fact that “iftop” cares about connections+ports also suggests that it operates at layer4
  • iptraf – this tool seems to be too old now, and its results were off by a multiple of 4 to 5
  • ifconfig – shows the most low-level statistics, namely bytes that passed as RX or TX through the network device; the most trusted source of performance data

We notice that both “iperf” and “iftop” measure the useful payload data that we can transfer per second. Since all OSI layers have some overhead, let’s take a look at what theory says about bandwidth efficiency in Ethernet:

  • with a standard MTU frame of 1500 bytes, we get 94.93% efficiency (5.07% overhead)
  • with a jumbo MTU frame of 9000 bytes, we get 99.14% efficiency (0.86% overhead)

Those numbers correspond very closely with the results shown by “iperf”.

It’s only “iftop” which differs a lot. Analysis of its source code reveals the reason for this and how we must interpret the displayed results:

# ui.c

void ui_print() {
    mvaddstr(y, COLS - 8 * HISTORY_DIVISIONS - 8, "rates:");


void draw_totals(host_pair_line* totals) {
    for(j = 0; j < HISTORY_DIVISIONS; j++) {
        readable_size((totals->sent[j] + totals->recv[j]) , buf, 10, 1024, options.bandwidth_in_bytes);

# ui_common.c

 * Format a data size in human-readable format
void readable_size(float n, char* buf, int bsize, int ksize, int bytes) {
    float size = 1;
    while(1) {
      size *= ksize;
        snprintf(buf, bsize, " %4.2f%s", n / size, bytes ? unit_bytes[i] : unit_bits[i]);

The authors of “iftop” decided to round to Gigibit (multiple of 1024), instead of the more common Gigabit (multiple of 1000). This makes the difference by “iftop” bigger as the transfer rate gets higher. For Gigabit the difference is 7%.

Once the “iftop” values are converted from Gigibit to Gigabit, they also match the results by “iperf” and the raw Linux network device counters.


Linux md-RAID scalability on a 10 Gigabit network

The question for today is – does Linux md-RAID scale to 10 Gbit/s?

I wanted to build a proof of concept for a scalable, highly available, fault tolerant, distributed block storage, which utilizes commodity hardware, runs on a 10 Gigabit Ethernet network, and uses well-tested open-source technologies. This is a simplified version of Ceph. The only single point of failure in this cluster is the client itself, which is inevitable in any solution.

Here is an overview diagram of the setup:
Linux md-RAID scalability on a 10 Gigabit network

My test lab is hosted on AWS:

  • 3x “c4.8xlarge” storage servers
    • each of them has 5x 50 GB General Purpose (SSD) EBS attached volumes which provide up to 160 MiB/s and 3000 IOPS for extended periods of time; practical tests shown 100 MB/s sustained sequential read/write performance per volume
    • each EBS volume is managed via LVM and there is one logical volume with size 15 GB
    • each 15 GB logical volume is being exported by iSCSI to the client machine
  • 1x “c4.8xlarge” client machine
    • the client machine initiates an iSCSI connection to each single 15 GB logical volume, and thus has 15 identical iSCSI block devices (3 storage servers x 5 block devices = 15 block devices)
    • to achieve a 3x replication factor, the block devices from each storage server are grouped into 5x mdadm software RAID-1 (mirror) devices; each RAID-1 device “md1” to “md5” contains three disks from a different storage server, so that if one or two of the storage servers fail, this won’t affect the operation of the whole RAID-1 device
    • all RAID-1 devices “md1” to “md5” are grouped into a single RAID-0 (stripe), in order to utilize the full bandwidth of all devices into a single block device, namely the “md99” RAID-0 device, which also combines the size capacity of all “md1” to “md5” devices and it equals to 75 GB
  • 10 Gigabit network in a VPC using Jumbo frames
  • the storage servers and the client machine were limited on boot to 4 CPUs and 2 GB RAM, in order to minimize the effect of the Linux disk cache
  • only sequential and random reading were benchmarked
  • Linux md RAID-1 (mirror) does not read from all underlying disks by default, so I had to create a RAID-1E (mirror) configuration; more info here and here; the “mdadm create” options follow: --level=10 --raid-devices=3 --layout=o3 Continue reading

Leave a comment

The “cp” command may corrupt your files on Debian Wheezy

We recently had two files corrupted on Debian Wheezy (the current “stable” release). The first one had some garbage, instead of the real data, the other had only zero characters. Only a small part of the files of about 3K was corrupted. This affects both “ext3” and “ext4” file-systems.

It turns out to be a free-memory read bug in cp from coreutils-[8.11..8.19] reported to GNU in Oct/2012. Almost a year ago it was also reported to Debian in Apr/2014 with severity “grave“.

Today we test if the bug is fixed using the PoC given in the original GNU bug report:

$ perl -e 'for (1..3333) { sysseek (*STDOUT, 4096, 1)' -e '&& syswrite (*STDOUT, "a" x 1024) or die "$!"}' > j

$ valgrind cp j j2

==13175== Memcheck, a memory error detector
==13175== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==13175== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==13175== Command: cp j j2
==13175== Invalid read of size 4
==13175==    at 0x8051229: ??? (in /bin/cp)
==13175==    by 0x153FFF: ???
==13175==  Address 0x424ed0c is 1,356 bytes inside a block of size 1,440 free'd
==13175==    at 0x40283EE: realloc (vg_replace_malloc.c:632)
==13175==    by 0x805820B: ??? (in /bin/cp)
==13175==    by 0x153FFF: ???


==15843== ERROR SUMMARY: 15 errors from 9 contexts (suppressed: 25 from 6)

It turns out that the bug is not fixed in Debian. Unfortunately, upgrade of the “coreutils” package from Jessie is not an option, where this bug is not present. The “coreutils” package from Jessie depends on a newer “libc6” and futhermore would introduce too many (untested) changes to the core utils.

Here is how to rebuild the “coreutils” package by applying the “cp” data corruption patch:

root@machine1:~# cowbuilder --login

COW-machine1:~# apt-get update
COW-machine1:~# apt-get upgrade

COW-machine1:~# mkdir /root/coreutils
COW-machine1:~# cd /root/coreutils

COW-machine1:~/coreutils# apt-get source coreutils
COW-machine1:~/coreutils# apt-get build-dep coreutils

COW-machine1:~/coreutils# cd coreutils-8.13
COW-machine1:~/coreutils/coreutils-8.13# wget 'http://git.savannah.gnu.org/cgit/coreutils.git/patch/?id=64aef5fb9afecc023a6e719da161dbbf450908b8' -O cp-avoid_data_corrupting_free_memory_read.patch

COW-machine1:~/coreutils/coreutils-8.13# patch -p1 < cp-avoid_data_corrupting_free_memory_read.patch
COW-machine1:~/coreutils/coreutils-8.13# DEBFULLNAME='Admin Team' DEBEMAIL='box@example.com' dch --local '~patched' 'Local build with cp data corruption patch'
COW-machine1:~/coreutils/coreutils-8.13# dpkg-buildpackage -b -rfakeroot

root@machine1:~# cp /var/cache/pbuilder/build/cow.1385/root/coreutils/coreutils_8.13-3.5~patched1_i386.deb /root/tmp/

Finally, you need to install the “.deb” file on your system and prevent APT from auto-upgrading it. You’d need to recompile it every time Debian “stable” releases a mainstream update for “cureutils”. This doesn’t happen that often. Furthermore, we hope that Debian will react to the bug report and will fix the bug in their source tree for Wheezy “stable”.


MemAvailable metric for Linux kernels before 3.14 in /proc/meminfo

A great new metric has been introduced in “/proc/meminfo” in the Linux 3.14 kernel — MemAvailable:

An estimate of how much memory is available for starting new applications, without swapping. Calculated from MemFree, SReclaimable, the size of the file LRU lists, and the low watermarks in each zone.

The estimate takes into account that the system needs some page cache to function well, and that not all reclaimable slab will be reclaimable, due to items being in use. The impact of those factors will vary from system to system.

I recommend that you read the kernel commit description for further details.

Since many people are still using Linux kernels before 3.14, I’ve backported this kernel patch to Perl. You can download the sources from GitHub: https://github.com/famzah/linux-memavailable-procfs

Many system administrators rely on the “free” tool to get a quick overview of the system’s memory usage. Unfortunately, the latest “procps” package still doesn’t interpret the “MemAvailable” metric, and even if it did, we don’t have it in Linux kernels before 3.14. Actually, the developers of the “procps-ng” package (which is the Debian, Fedora and openSUSE fork of “procps“) have reacted and did the same thing as me. For kernels before 3.14 they emulate the metric in the same way, and for kernels after 3.14, they display the native metric from “/proc/meminfo”. This makes my Perl port more or less redundant.

This is the reason I wrote a quick replacement of the “free” tool in Perl. A few examples of it follow.

Typical memory usage overview, in MBytes:

famzah@vbox:~$ ./free.pl -m
             total       used       free  anonymous     kernel     caches     others
Mem:          2488       1228       1259        608         24        580         15
  -/+ caches              648       1840
Swap:         1565          0       1565

Typical memory usage overview, in percentage:

famzah@vbox:~$ ./free.pl -mp
             total       used       free  anonymous     kernel     caches     others
Mem:          2488        49%        51%        24%         1%        23%         1%
  -/+ caches              26%        74%
Swap:         1565         0%       100%

Extended memory usage overview, in MBytes:

famzah@vbox:~$ ./free.pl -me
             total       used       free  anonymous     kernel     caches     others
Mem:          2488       1228       1260        608         24        580         14
  -/+ caches              647       1840
Swap:         1565          0       1565

Extended memory usage info:
  Buffers                  83
  Cached                  785
  SwapCached                0
  Shmem                   308
  AnonPages               300
  Mapped                  122
  Unevict+Mlocked           5
  Dirty+Writeback           0
  NFS+Bounce                0

Extended memory usage overview, in percentage:

famzah@vbox:~$ ./free.pl -mep
             total       used       free  anonymous     kernel     caches     others
Mem:          2488        49%        51%        24%         1%        23%         1%
  -/+ caches              26%        74%
Swap:         1565         0%       100%

Extended memory usage info:
  Buffers                  3%
  Cached                  32%
  SwapCached               0%
  Shmem                   12%
  AnonPages               12%
  Mapped                   5%
  Unevict+Mlocked          0%
  Dirty+Writeback          0%
  NFS+Bounce               0%

Memory logo


Know your Linux memory usage

I’ve given another try to understand the Linux memory usage statistics which are exported by “/proc/meminfo“. The main reason was to know how much memory is still available for free allocation by user applications, and also to understand the memory usage of the different Linux subsystems like the kernel, file system cache, etc.

The whole analysis is done on 649 machines with different workload. Most of them are running a 64-bit Linux kernel 3.2.59, while the others are running a 64-bit kernel with versions between 3.2.42 and 3.2.60. This is an LTS Linux kernel.

Cached memory and tmpfs

One of the most astonishing facts that I found out was that the “tmpfs” in-memory file system is counted against the “Cached” value in “/proc/meminfo”. The “tmpfs” file system is commonly mounted in “/dev/shm”. There is a separate stats key “Shmem” which shows the amount of memory allocated to “tmpfs” files, but at the same time this memory is added to the value of “Cached”. This is rather confusing because we’re used to think that both “Cached” and “Buffers” are reclaimable memory which can be freed anytime (or very soon enough) for use by applications or the kernel. That’s the current behavior of the “free” memory statistics tool which is widely used by System administrators to get an overview of the memory usage on their Linux systems. The “free” tool is part of the “procps” package which as of today is still not fixed and showing the “Cached” memory as free.

Here is a little proof:

famzah@vbox:~$ free -m
             total       used       free     shared    buffers     cached
Mem:          2488        965       1523          0         83        517
-/+ buffers/cache:        363       2124
Swap:         1565          0       1565

famzah@vbox:~$ dd if=/dev/zero of=/dev/shm/bigfile bs=1M count=800
838860800 bytes (839 MB) copied, 0.53357 s, 1.6 GB/s

famzah@vbox:~$ free -m
             total       used       free     shared    buffers     cached
Mem:          2488       1765        723          0         83       1317
-/+ buffers/cache:        364       2124
Swap:         1565          0       1565

The “cached” memory got 800 MB bigger. The free “-/+ buffers/cache” memory is still the same amount (2124 MB), which is wrong because the used memory in “tmpfs” cannot be reclaimed nor given to user applications or the kernel.

The reason that “tmpfs” is added to the “cached” memory usage is because “tmpfs” is implemented in the page cache which is accounted in the “cached” stats.

Active, Inactive, and Slab

The following is true for all machines that I tested on:

  • Active = Active(anon) + Active(file)
  • Inactive = Inactive(anon) + Inactive(file)
  • Slab = SReclaimable + SUnreclaim

Total memory usage

I tried to sum up the usage of the different fields in “/proc/meminfo” up to the maximum amount of available memory on each machine. The following two formulas both represent the whole memory usage:

  • MemTotal = MemFree + (Buffers + Cached + SwapCached) + AnonPages + (Slab + PageTables + KernelStack)
  • MemTotal = MemFree + (Active + Inactive) + (Slab + PageTables + KernelStack)

Both those formulas can be expanded to include the sub-components as well:

  • MemTotal = MemFree + (Buffers + Cached + SwapCached) + AnonPages + ((SReclaimable + SUnreclaim) + PageTables + KernelStack)
  • MemTotal = MemFree + ((“Active(anon)” + “Active(file)”) + (“Inactive(anon)” + “Inactive(file)”)) + ((SReclaimable + SUnreclaim) + PageTables + KernelStack)

The formulas give an error of up to -3% for only 6 machines. For the rest of the machines the error is less than +/- 1.5%. The less total available memory a machine has, the bigger the error. I suppose that this small error comes by the fact that not all “/proc/meminfo” counters are updated atomically at once.

Statistical keys

The following keys in “/proc/meminfo” seem to be accounted into other super-set keys, and are therefore only for statistical purpose:

  • Mapped
  • Dirty
  • Writeback
  • Unevictable
  • Shmem — accounted in “Cached”
  • VmallocUsed


Leave a comment

Private networking per-process in Linux

This is a follow-up of the Private /tmp mount per-process in Linux. As already stated there, Linux namespaces offer great options for security.

In this article we will demonstrate the use of the “network” namespace which enables a process to have independent IPv4 and IPv6 stacks, network interfaces, IP routing tables, iptables firewall rules, the /proc/net and /sys/class/net directory trees, sockets, etc.

Here is a diagram to illustrate the concept:
Linux network namespace

First we start by creating a pair of “veth” network interfaces:

ip link add v-eth1 type veth peer name v-peer1
ip link set v-eth1 up
ip link set v-peer1 up

One of those interfaces will be used as a communication point from the side of the original default network namespace. We will assign “” for IP address:

ifconfig v-eth1 netmask up

It is time to enter the new network namespace. Once we have created the new namespace, we will associate the second interface “v-peer1” with it, then we will configure an IP address “” and add a default route through the first interface which will act as a router:

export MAIN_NS_PID="$$"
unshare -n /bin/bash

# We are in a "/bin/bash" session in the NEW network namespace now.

ip link set lo up # activate the "loopback" interface

nsenter --net="/proc/$MAIN_NS_PID/ns/net" ip link set v-peer1 netns "$$" # join "v-peer1" into this namespace
ifconfig v-peer1 netmask up
route add default gw dev v-peer1

# Setup is done.
# You can now drop privileges and launch a daemon which will use this confined network namespace.

sudo -u www-data /etc/init.d/my-net-daemon start

The original default namespace, our original Linux installation, must be configured to act as a router. Otherwise the processes inside the new network namespace won’t have any Internet access. Configuring a Linux network router is a straightforward task:

echo 1 > /proc/sys/net/ipv4/ip_forward
iptables -P FORWARD DROP
iptables -F FORWARD

iptables -t nat -F
iptables -t nat -A POSTROUTING -s -o eth0 -j MASQUERADE

iptables -A FORWARD -i eth0 -o v-eth1 -j ACCEPT
iptables -A FORWARD -o eth0 -i v-eth1 -j ACCEPT

Finally, you can enable inbound connections to the processes in the confined new network namespace. Let’s assume that you have a daemon listening on TCP port 10105. Here is how you can forward any new incoming connections to the processes inside the new network namespace:

iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 10105 -j DNAT --to-destination

Pros: Using separate network namespaces gives us full network isolation and control over a group of processes. Additionally, we can match incoming packets against a process which is not possible in a standard “iptables” setup using the “-m owner” match extension. These are huge security benefits.

Cons: The technical implications are that the Linux host has to do (a lot) more work because of the DNAT/SNAT operations and their related connection tracking overhead. If you are running a high traffic server, you should plan and test accordingly. Furthermore, one additional network interfaces pair is created for each new network namespace. Linux can handle hundreds of network devices but still this is a factor to be considered.

The better security features outweigh the drawbacks in most use-cases though. Last but not least, it is very easily to run a process with completely detached network and this won’t cost us anything on the Linux host.


1 Comment

Private /tmp mount per-process in Linux

I’ve been playing with Linux namespaces and the results are very satisfying. This process isolation has several benefits:

  • The setup is automatically destroyed when the process and its children exit — easy maintenance.
  • Non-privileged processes cannot alter the setup — great security.
  • The isolated resource type is completely invisible by processes in other namespaces — great security.
  • The setup is inherited by any forked children — great for security and maintenance.

If you review the man page of the “unshare” command or syscall, you will see that currently we can have the following private namespaces:

  • mount namespace — mounting and unmounting filesystems will not affect rest of the system, except for filesystems which are explicitly marked as shared
  • UTS namespace — setting hostname, domainname will not affect rest of the system
  • IPC namespace — the process will have independent namespace for System V message queues, semaphore sets and shared memory segments
  • network namespace — the process will have independent IPv4 and IPv6 stacks, IP routing tables, iptables firewall rules, the /proc/net and /sys/class/net directory trees, sockets, etc.
  • pid namespace (new) — children will have a distinct set of PID to process mappings from their parent
  • user namespace (new) — the process will have a distinct set of UIDs, GIDs and capabilities

In this article we will demonstrate the use of the “mount” namespace which lets us mount a filesystem per-process without affecting the rest of the system. Using such a private mount for “/tmp” has mainly security but also usability benefits.

Here are all the commands which you need, in order to start a process with a private “/tmp” directory:

TARGET_CMD='/bin/bash' # but it can be any command
NEWTMP="$(mktemp -d)" # securely create a new empty tmp folder

chown "root:$TARGET_USER" "$NEWTMP"
chmod 770 "$NEWTMP"

unshare --mount -- /bin/bash -c "mount -o bind,noexec,nosuid,nodev '$NEWTMP' /tmp && sudo -u '$TARGET_USER' $TARGET_CMD"

A longer version with more explanations follow:

# setup operations done as "root"

root@vbox:~# TARGET_USER='www-data'
root@vbox:~# TARGET_CMD='/bin/bash' # but it can be any command
root@vbox:~# NEWTMP="$(mktemp -d)" # securely create a new empty tmp folder
root@vbox:~# chown "root:$TARGET_USER" "$NEWTMP"
root@vbox:~# chmod 770 "$NEWTMP"

# review the result in the real file-system "/tmp"

root@vbox:~# echo $NEWTMP

root@vbox:~# ls -la /tmp
total 60
drwxrwxrwt 12 root   root     12288 Jun  4 13:53 .
drwxr-xr-x 23 root   root      4096 Jan 24 15:31 ..
drwxrwxrwt  2 root   root      4096 Jun  1 22:54 .ICE-unix
drwxrwx---  2 root   www-data  4096 Jun  4 13:53 tmp.IyoUhputAW

root@vbox:~# ls -la "$NEWTMP"
total 16
drwxrwx---  2 root www-data  4096 Jun  4 13:53 .
drwxrwxrwt 12 root root     12288 Jun  4 13:53 ..

# start the non-privileged process with a private "/tmp" mount

root@vbox:~# unshare --mount -- /bin/bash -c "mount -o bind,noexec,nosuid,nodev '$NEWTMP' /tmp && sudo -u '$TARGET_USER' $TARGET_CMD"

# sample operations done inside the non-privileged process

www-data@vbox:~$ ls -la / | grep tmp
drwxrwx---   2 root www-data  4096 Jun  4 13:53 tmp

www-data@vbox:~$ touch /tmp/test-www-data-file

www-data@vbox:~$ ls -la /tmp # the process has a private "/tmp" mount
total 8
drwxrwx---  2 root     www-data 4096 Jun  4 13:55 .
drwxr-xr-x 23 root     root     4096 Jan 24 15:31 ..
-rw-r--r--  1 www-data www-data    0 Jun  4 13:55 test-www-data-file

# see the result in the real file-system "/tmp"

root@vbox:~# ls -la /tmp
total 60
drwxrwxrwt 12 root   root     12288 Jun  4 13:53 .
drwxr-xr-x 23 root   root      4096 Jan 24 15:31 ..
drwxrwxrwt  2 root   root      4096 Jun  1 22:54 .ICE-unix
drwxrwx---  2 root   www-data  4096 Jun  4 13:55 tmp.IyoUhputAW

root@vbox:~# echo "$NEWTMP"

root@vbox:~# ls -la "$NEWTMP"
total 16
drwxrwx---  2 root     www-data  4096 Jun  4 13:55 .
drwxrwxrwt 12 root     root     12288 Jun  4 13:53 ..
-rw-r--r--  1 www-data www-data     0 Jun  4 13:55 test-www-data-file

Note that we are mounting a directory inside another directory using the “bind” mount feature of Linux.


Leave a comment

ProFTPD inheritance of “.ftpaccess” files

ProFTPD and Apache have a lot in common in their concept for inheriting the per-directory settings files. ProFTPD and Apache use “.ftpaccess” and “.htaccess” files respectively.

There is one substantial difference though. ProFTPD does not always look from the current directory to the very root “/” directory, because ProFTPD uses chroot() when the “DefaultRoot” directive is set to “~”, for example. Many distributions set this directive to “~” by default.

Let’s have an example:

  • “/home/$user/.ftpaccess” — a “.ftpaccess” file EXISTS
  • “/home/$user/private/files/” — no “.ftpaccess” file

Let’s also assume that an FTP user has “/home/$user/private/files” for a home directory in the “/etc/passwd” file. If this user logs in, ProFTPD will chroot() to the user’s home directory first. This will effectively make the per-session “/” root directory to start from “/home/$user/private/files”. Therefore, the ProFTPD process will have access to the following directories:

  • “/home/$user/private/files/and-any-subdirectories” — accessible as “/and-any-subdirectories”
  • “/home/$user/private/files/” — accessible as “/”
  • “/home/$user/private/” — not accessible
  • “/home/$user/” — not accessible (“/home/$user/.ftpaccess” — EXISTS but not accessible)
  • “/home/” — not accessible
  • “/” — not accessible

As a result, even that we have the file “/home/$user/.ftpaccess” it won’t take effect for the FTP user who has “/home/$user/private/files/” for their home directory.

An strace excerpt follows which shows the exact steps which ProFTPD performs when it logs in an FTP user with “DefaultRoot” setting configured to “~”:

root@mysrv:~# getent passwd nobody
root@mysrv:~# getent group 99

root@mysrv:~# getent passwd ftpuser1
root@mysrv:~# getent group 4388

root@mysrv:~# strace -tt -f -p 3442
# main ProFTP process (PID=3442) accepts the new connection and forks a child process to handle it
3442  14:36:58.767425 accept(2, 0, NULL) = 7
3442  14:36:58.768078 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xec66f728) = 9389

# ProFTP child (PID=9389) becomes "root" temporarily
9389  14:36:58.771897 setresuid32(-1, 0, -1) = 0
9389  14:36:58.772026 setresgid32(-1, 0, -1) = 0

# open "syslog"
# read server's SSL CA, public and private files
# open "/var/log/proftpd/extended.log"
# manage the "/var/run/proftpd.scoreboard" file

# child becomes "nobody"
9389  14:36:58.785372 setresgid32(-1, 99, -1) = 0
9389  14:36:58.785421 setresuid32(-1, 99, -1) = 0

# log the connection and send a greeting message to the connected FTP client
# FTP client sends login information
9389  14:37:00.964572 read(0, "USER ftpuser1\r\n", 4102) = 14
9389  14:37:02.178714 read(0, "PASS SOME-PASS-INPUT\r\n", 4102) = 20

# ProFTP child becomes "root" temporarily
9389  14:37:02.192030 setresuid32(-1, 0, -1) = 0
9389  14:37:02.192090 setresgid32(-1, 0, -1) = 0

# open "/etc/shadow" to compare the supplied username and password
# open "/etc/shells" (temporarily switched UID/GID back to "nobody")
# open "/etc/ftpusers"
# open "/var/log/wtmp" to add a login entry there
# open "/var/log/xferlog"

# set any additional user "ftpuser1" groups
9389  14:37:02.201134 setgroups32(1, [4388]) = 0
9389  14:37:02.201301 setgid32(4388)    = 0

# chroot() to user's home directory (we are still with "root" privileges)
9389  14:37:02.208208 chroot("/home/ftpuser1/private/public_upload") = 0

# drop privileges to user's UID/GID
9389  14:37:02.209190 setgid32(4388)    = 0
9389  14:37:02.209253 setresuid32(-1, 5178, -1) = 0

# check for any ".ftpaccess" files
9389  14:37:02.209841 stat64("/.ftpaccess", 0xf640ddd0) = -1 ENOENT (No such file or directory)

Leave a comment

Enable Forward Secrecy in Apache 2.4

The Heartbleed bug once again raised a very serious question — what happens if someone steals our SSL private key. The future actions are all clear — we revoke the key, change it on the server and we are secure. However any recorded past SSL encrypted sessions which the attacker posses can be decrypted to plain text using the stolen SSL private key.

The article “SSL: Intercepted today, decrypted tomorrow” explains the problem in great details. It also suggests who may take advantage of your stolen SSL private key.

Fortunately there is a solution to this attack vector. It is called Forward Secrecy and solves the problem by using a different private key to encrypt each new SSL session. If an attacker wanted to decrypt all your SSL sessions, the attacker would need to brute-force the private keys of each of your SSL sessions. While this attack vector still exists, current computing power is too small to solve such a task in a reasonable time. Note that Forward Secrecy is not new at all and was invented in 1992, pre-dating the SSL protocol by two years, as stated in the “SSL: Intercepted today, decrypted tomorrow” article.

Many websites and blogs recommend their own cipher list which needs to be used, in order to enable Perfect Secrecy, and at the same time to not be vulnerable to the BEAST attack. Trusting such long lists of ciphers without understanding all of them makes me a bit insecure. Luckily the OpenSSL vendors have created a special string which we can use to identify the TLS v1.2 specific ciphers — “TLSv1.2”. This makes your Apache configuration very readable and leaves the control in the OpenSSL developers who can update the ciphers accordingly:

SSLProtocol -ALL -SSLv2 +SSLv3 +TLSv1 +TLSv1.1 +TLSv1.2
SSLHonorCipherOrder on
SSLCipherSuite TLSv1.2:RC4:HIGH:!aNULL:!eNULL:!MD5
SSLCompression off

TraceEnable Off

The effect of this configuration is that for TLS v1.2 connections you will prefer the strong TLS v1.2 ciphers. For all other connections, like SSLv3 or TLS v1.0, the RC4 ciphers will be used which are required to protect against the BEAST attack. Note that TLS v1.2 does not suffer from the BEAST attack.

This configuration won’t give you Forward Secrecy for Internet Explorer. The explanation here says:
“Internet Explorer, in all versions, does not support the ECDHE and RC4 combination (which has the benefit of supporting Forward Secrecy and being resistant to BEAST). But IE has long patched the BEAST vulnerability and so we shouldn’t worry about it.”

It’s worth mentioning that using those strong TLS v1.2 ciphers may increase the CPU time required to establish a new SSL connection 3 times.

Real-time SSL test:



Leave a comment

Cron job custom timezone

The default cron job daemon on Debian and Ubuntu does not support per-user timezones (see crontab(5) man page).

Here is a solution which runs hourly cron tasks in the timezone which you specified. For example, if you want to run “test.sh” at 5 AM and 7 PM in timezone Europe/Sofia, you need to create the following “cron.hourly” script:

/path/to/run-at.sh 5,19 Europe/Sofia $(( 15*60 )) /var/run/run-at-test.state /path/to/test.sh

Here is the source code of “run-at.sh”:

set -u

# We work exclusively with global variables.
# Functions are used just to separate logic and for self-documenting.

function display_usage() {
	if [ "$1" -ge 5 ]; then
		return; # enough parameters

	cat >&2 <<EOF
Runs COMMAND every day at the specified HOURS.

The execution is accounted and considered successful
only if COMMAND exits with 0. If there was no (successful)
execution within the WARN_TIME hours specified period,
a warning is being issued.

The STATE_FILE must pre-exist, so make sure that you
create it before the first run, or you will get a

This script is intended to be run in "cron.hourly".

 - HOURS -- comma-separated list; example: 5,12,19
 - TZ -- time zone; example: Europe/Sofia
 - WARN_TIME -- minutes; example: 360
 - STATE_FILE -- fill path to a writable file
 - COMMAND and ARGS to be executed on HOURS

 $0 5,19 Europe/Sofia \\
   \$(( 15*60 )) /root/.run-at-test.state \\
   date -d now +%c
	exit 1

function check_state_file_oldness() {
	# find file with mtime less than $WARN_TIME minutes
	if [ "$(find "$STATE_FILE" -type f -mmin -"$WARN_TIME")" != "$STATE_FILE" ]; then
		# file not found -> this will be indicated by 'find' on STDERR too
		# file is too old ("-mmin" condition not met)
		echo "WARNING: No successful run in the last $WARN_TIME minutes" >&2

function split_hours() {
	HOURS="${HOURS//,/ }" # replace "," with " "
	# https://blog.famzah.net/2013/02/17/
	#   bash-split-a-string-into-columns-by-white-space-without-invoking-sub-shells/
	HOURS=( $HOURS ) # now an ARRAY

function validate_tz() {
	# naive check; tested on Debian
	if [ ! -e "/usr/share/zoneinfo/$WANT_TZ" ]; then
		echo "ERROR: TZ seems to be invalid." >&2
		exit 1

function get_now_hour_in_tz() {
	NOW_HOUR="$(TZ="$WANT_TZ" date +%H)" # get current hour in 24h-format using the $WANT_TZ

function check_if_we_should_run_or_exit() {
	for h in "${HOURS[@]}" ; do
		if [ "$h" -eq "$NOW_HOUR" ]; then

	if [ "$RUN" -eq 0 ]; then
		exit 0

function execute_command_and_get_exit_code() {
	"$@" # execute the command

function update_state_file_mtime() {
	if [ "$EC" -eq 0 ]; then
		touch "$STATE_FILE"

#### ### ### ###

display_usage "$#"

# parse_argv

HOURS="$1" ; shift
WANT_TZ="$1" ; shift
WARN_TIME="$1" ; shift
STATE_FILE="$1" ; shift
# the rest in "$@" is the command to be executed



execute_command_and_get_exit_code "$@"