Enable Forward Secrecy in Apache 2.4

The Heartbleed bug once again raised a very serious question — what happens if someone steals our SSL private key. The future actions are all clear — we revoke the key, change it on the server and we are secure. However any recorded past SSL encrypted sessions which the attacker posses can be decrypted to plain text using the stolen SSL private key.

The article “SSL: Intercepted today, decrypted tomorrow” explains the problem in great details. It also suggests who may take advantage of your stolen SSL private key.

Fortunately there is a solution to this attack vector. It is called Forward Secrecy and solves the problem by using a different private key to encrypt each new SSL session. If an attacker wanted to decrypt all your SSL sessions, the attacker would need to brute-force the private keys of each of your SSL sessions. While this attack vector still exists, current computing power is too small to solve such a task in a reasonable time. Note that Forward Secrecy is not new at all and was invented in 1992, pre-dating the SSL protocol by two years, as stated in the “SSL: Intercepted today, decrypted tomorrow” article.

Many websites and blogs recommend their own cipher list which needs to be used, in order to enable Perfect Secrecy, and at the same time to not be vulnerable to the BEAST attack. Trusting such long lists of ciphers without understanding all of them makes me a bit insecure. Luckily the OpenSSL vendors have created a special string which we can use to identify the TLS v1.2 specific ciphers — “TLSv1.2″. This makes your Apache configuration very readable and leaves the control in the OpenSSL developers who can update the ciphers accordingly:

SSLProtocol -ALL -SSLv2 +SSLv3 +TLSv1 +TLSv1.1 +TLSv1.2
SSLHonorCipherOrder on
SSLCipherSuite TLSv1.2:RC4:HIGH:!aNULL:!eNULL:!MD5
SSLCompression off

TraceEnable Off

The effect of this configuration is that for TLS v1.2 connections you will prefer the strong TLS v1.2 ciphers. For all other connections, like SSLv3 or TLS v1.0, the RC4 ciphers will be used which are required to protect against the BEAST attack. Note that TLS v1.2 does not suffer from the BEAST attack.

This configuration won’t give you Forward Secrecy for Internet Explorer. The explanation here says:
“Internet Explorer, in all versions, does not support the ECDHE and RC4 combination (which has the benefit of supporting Forward Secrecy and being resistant to BEAST). But IE has long patched the BEAST vulnerability and so we shouldn’t worry about it.”

It’s worth mentioning that using those strong TLS v1.2 ciphers may increase the CPU time required to establish a new SSL connection 3 times.


Real-time SSL test:

Resources:

Solutions:

Cron job custom timezone

The default cron job daemon on Debian and Ubuntu does not support per-user timezones (see crontab(5) man page).

Here is a solution which runs hourly cron tasks in the timezone which you specified. For example, if you want to run “test.sh” at 5 AM and 7 PM in timezone Europe/Sofia, you need to create the following “cron.hourly” script:

#!/bin/bash
/path/to/run-at.sh 5,19 Europe/Sofia $(( 15*60 )) /var/run/run-at-test.state /path/to/test.sh

Here is the source code of “run-at.sh”:

#!/bin/bash
set -u

# XXX
# We work exclusively with global variables.
# Functions are used just to separate logic and for self-documenting.

function display_usage() {
	if [ "$1" -ge 5 ]; then
		return; # enough parameters
	fi

	cat >&2 <<EOF
Usage: $0 HOURS TZ WARN_TIME STATE_FILE COMMAND [ARGS...]
Runs COMMAND every day at the specified HOURS.

The execution is accounted and considered successful
only if COMMAND exits with 0. If there was no (successful)
execution within the WARN_TIME hours specified period,
a warning is being issued.

The STATE_FILE must pre-exist, so make sure that you
create it before the first run, or you will get a
warning.

This script is intended to be run in "cron.hourly".

Arguments:
 - HOURS -- comma-separated list; example: 5,12,19
 - TZ -- time zone; example: Europe/Sofia
 - WARN_TIME -- minutes; example: 360
 - STATE_FILE -- fill path to a writable file
 - COMMAND and ARGS to be executed on HOURS

Example:
 $0 5,19 Europe/Sofia \\
   \$(( 15*60 )) /root/.run-at-test.state \\
   date -d now +%c
EOF
	exit 1
}

function check_state_file_oldness() {
	# find file with mtime less than $WARN_TIME minutes
	if [ "$(find "$STATE_FILE" -type f -mmin -"$WARN_TIME")" != "$STATE_FILE" ]; then
		# file not found -> this will be indicated by 'find' on STDERR too
		# file is too old ("-mmin" condition not met)
		echo "WARNING: No successful run in the last $WARN_TIME minutes" >&2
	fi
}

function split_hours() {
	HOURS="${HOURS//,/ }" # replace "," with " "
	# http://blog.famzah.net/2013/02/17/
	#   bash-split-a-string-into-columns-by-white-space-without-invoking-sub-shells/
	HOURS=( $HOURS ) # now an ARRAY
}

function validate_tz() {
	# naive check; tested on Debian
	if [ ! -e "/usr/share/zoneinfo/$WANT_TZ" ]; then
		echo "ERROR: TZ seems to be invalid." >&2
		exit 1
	fi
}

function get_now_hour_in_tz() {
	NOW_HOUR="$(TZ="$WANT_TZ" date +%H)" # get current hour in 24h-format using the $WANT_TZ
}

function check_if_we_should_run_or_exit() {
	RUN=0
	for h in "${HOURS[@]}" ; do
		if [ "$h" -eq "$NOW_HOUR" ]; then
			RUN=1
			break
		fi
	done

	if [ "$RUN" -eq 0 ]; then
		exit 0
	fi
}

function execute_command_and_get_exit_code() {
	"$@" # execute the command
	EC="$?"
}

function update_state_file_mtime() {
	if [ "$EC" -eq 0 ]; then
		touch "$STATE_FILE"
	fi
}

#### ### ### ###

display_usage "$#"

# parse_argv

HOURS="$1" ; shift
WANT_TZ="$1" ; shift
WARN_TIME="$1" ; shift
STATE_FILE="$1" ; shift
# the rest in "$@" is the command to be executed

check_state_file_oldness

split_hours
validate_tz
get_now_hour_in_tz
check_if_we_should_run_or_exit

execute_command_and_get_exit_code "$@"
update_state_file_mtime

Using flock() in Bash without invoking a subshell

The flock(1) utility on Linux manages flock(2) advisory locks from within shell scripts or the command line. This lets you synchronize your Bash scripts with all your other applications written in Perl, Python, C, etc.

I’ll focus on the third usage form where flock() is used inside a Bash script. Here is what the man page suggests:

#!/bin/bash

(
flock -s 200

# ... commands executed under lock ...

) 200>/var/lock/mylockfile

Unfortunately, this invokes a subshell which has the following drawbacks:

  • You cannot pass values to variables from the subshell in the main shell script.
  • There is a performance penalty.
  • The syntax coloring in “vim” does not work properly. :)

This motivated my colleague zImage to come up with a usage form which does not invoke a subshell in Bash:

#!/bin/bash

exec 200>/var/lock/mylockfile || exit 1
flock -n 200 || { echo "ERROR: flock() failed." >&2; exit 1; }

# ... commands executed under lock ...

flock -u 200

Nagios: Improve CPU performance with popen_noshell()

Today I’ll share my real-world experience with popen_noshell() on the Nagios monitoring server which we run at work. We are actively monitoring 1166 hosts and 14250 services. The machine has 6 GB RAM and a single Intel Core i7-950 CPU with enabled multi-threading (8 total threads) and slight overclock. Besides running Nagios, this machine also handles the incoming data from our custom monitoring systems, processes RRD database storage, and generates web interface status + charts output. So it’s a pretty busy machine which does a lot of network activity and where the Nagios daemon is just a part of the CPU load. For example, since boot the main “nagios3″ process has used only 20% of the CPU. The other part has been used by the fork()’ed Perl scripts (we use a lot of them for the active checks), the Nagios standard network checks, and the Apache/PHP web server handling the incoming data.

Recently the machine started to exhaust its CPU resources. First we overclocked it a bit which gave us 10% more CPU idle time. Then we decided to try to compile Nagios with the popen-noshell library. This gave us another 10% CPU idle and now the machine is working great again.

I’ll focus on the popen-noshell integration and results, since CPU overclocking is a well-known topic. Here is the chart which shows the CPU usage before and after we re-compiled Nagios with the popen-noshell library:

nagios-popen-noshell-benchmark-results

As we can see, the system-CPU usage dropped from 38% to 31%, which is an 18% improvement. The user-CPU usage dropped from 44% to 41%, which is a 7% improvement. Overall, we gained a 12% speed-up for our workload by just re-compiling Nagios with the popen-noshell library. I’m stressing out that the speed-up depends a lot on your workload. If this machine was busy only with Nagios and the active checks were more CPU efficient (i.e. not written in Perl but in C), then the speed-up could have been much higher, since popen_noshell() is about 10 times faster than the standard popen().

A list with the other machine metrics which were also affected by the workload change:

  • Used memory: 39% => 24% (38% less)
  • Load average: 39 => 46 (18% higher)
  • Forks rates: 8*61 => 8*61 (created processes/second – no change)

Here are the steps that you need to perform, in order to re-compile the Nagios Debian package by integrating it with the popen-noshell library:

apt-get install devscripts

apt-get build-dep nagios3-core
# No need to run as "root" from here on
apt-get source nagios3-core

svn checkout http://popen-noshell.googlecode.com/svn/trunk/ popen-noshell

cd nagios3-3.2.1/

# BEGIN: patch Nagios to use popen_noshell_compat()

cp ../popen-noshell/popen_noshell.* base/
vi base/Makefile.in
	OBJS=$(BROKER_O) popen_noshell.o 

vi base/utils.c
	#include "popen_noshell.h"
	
        /* run the command */
        struct popen_noshell_pass_to_pclose pclose_arg;
        fp=(FILE *)popen_noshell_compat(cmd,"r",&pclose_arg);

            /* close the command and get termination status */
            status=pclose_noshell(&pclose_arg);

vi base/checks.c
	2x the same as above

# END: patch Nagios to use popen_noshell_compat()

EDITOR=vim dch -i
	# 3.2.1-2+squeeze1 -> 3.2.1-2+squeeze1-noshell1
	# you must have a trailing number in the added version name
	# after exit, this renames the original directory name

cd ..
mv nagios3_3.2.1.orig.tar.gz nagios3_3.2.1-2+squeeze1.orig.tar.gz

# the source directory was renamed by "dch"
cd nagios3-3.2.1-2+squeeze1/
DEB_BUILD_OPTIONS=nocheck debuild -us -uc

cd ..
sudo dpkg -i nagios3-core_3.2.1-2+squeeze1-noshell1_i386.deb \
	nagios3-common_3.2.1-2+squeeze1-noshell1_all.deb \
	nagios3-cgi_3.2.1-2+squeeze1-noshell1_i386.deb \
	nagios3-doc_3.2.1-2+squeeze1-noshell1_all.deb \
	nagios3_3.2.1-2+squeeze1-noshell1_i386.deb

An “xargs” alternative to GNU Parallel

I wanted to use GNU Parallel on my Ubuntu system, in order to process some data in parallel. It turned out that there is no official package for Ubuntu. As of Ubuntu Quantal released on April/2014, this has been corrected and the package is in the official repository.

Reading a bit more brought me to the astonishing fact that “xargs” can run commands in parallel. The “xargs” utility is something I use every day and this parallelism feature made it even more useful.

Let’s try it by running the following:

famzah@vbox:~$ echo 10 20 30 40 50 60 | xargs -n 1 -P 4 sleep

The use of “-n 1″ is vital if you want to pass only one command-line argument from the list to each parallel process.

Here is the result:

# right after we launched "xargs"
famzah@vbox:~$ ps f -o pid,command
  PID COMMAND
 5068 /bin/bash
 7007  \_ xargs -n 1 -P 4 sleep
 7008      \_ sleep 10
 7009      \_ sleep 20
 7010      \_ sleep 30
 7011      \_ sleep 40

# 10 seconds later (the first "sleep" has just exited)
famzah@vbox:~$ ps f -o pid,command
  PID COMMAND
 5068 /bin/bash
 7007  \_ xargs -n 1 -P 4 sleep
 7009      \_ sleep 20
 7010      \_ sleep 30
 7011      \_ sleep 40
 7017      \_ sleep 50

# 20 seconds later (the second and third "sleep" commands have exited)
# we now have only 3 simultaneous processes (no more arguments to process)
famzah@vbox:~$ ps f -o pid,command
  PID COMMAND
 5068 /bin/bash
 7007  \_ xargs -n 1 -P 4 sleep
 7011      \_ sleep 40
 7017      \_ sleep 50
 7023      \_ sleep 60

It’s worth mentioning that if “xargs” fails to execute the binary, it prematurely terminates the failed parallel processing queue, which leaves some of the stdin arguments not processed:

famzah@vbox:~$ echo 10 20 30 40 50 60 | xargs -n 1 -P 4 badexec-name
xargs: badexec-namexargs: badexec-name: No such file or directory: No such file or directory

xargs: badexec-namexargs: badexec-name: No such file or directory
: No such file or directory

The output is scrambled because all parallel processes write to the screen with no locking synchronization. This seems to be a known issue. The point is that we could expect that “xargs” would try to execute “badexec-name” for every command-line argument (total of six attempts in our example). It turns out that “xargs” bails out the same way even if we don’t use the “-P” option:

# standard usage of "xargs"
famzah@vbox:~$ echo 10 20 30 40 50 60 | xargs -n 1 badexec-name
xargs: badexec-name: No such file or directory

Not a very cool behavior. I’ve reported this as a bug to the GNU community. If you review the responses to the bug report, you will find out that this actually is an intended feature. :)

If the provided command to “xargs” is a valid one but it fails during the execution, there are no surprises and “xargs” continues with the next command-line argument by executing a new command:

famzah@vbox:~$ echo 10 20 30 40 50 60 | xargs -n 1 -P 4 rm
rm: rm: cannot remove `10'cannot remove `40': No such file or directory
: No such file or directory
rm: cannot remove `20': No such file or directory
rm: cannot remove `30': No such file or directory
rm: cannot remove `60': No such file or directory
rm: cannot remove `50': No such file or directory

The output here is scrambled too because all parallel processes write to the screen with no locking synchronization. We see however that all command-line arguments from “10″ to “60″ were processed by executing a command for each of them.

Bash: Split a string into columns by white-space without invoking a subshell

The classical approach is:

RESULT="$(echo "$LINE"| awk '{print $1}')" # executes in a subshell 

Processing thousands of lines this way however fork()’s thousands of processes, which affects performance and makes your script CPU hungry.

Here is the effective solution which I found with my colleagues at work:

COLS=( $LINE ); # parses columns without executing a subshell
RESULT="${COLS[0]}"; # returns first column (0-based indexes)

Here is an example:

LINE="col0 col1  col2     col3  col4      " # white-space including tab chars
COLS=( $LINE ); # parses columns without executing a subshell

echo "${COLS[0]}"; # prints "col0"
echo "${COLS[1]}"; # prints "col1"
echo "${COLS[2]}"; # prints "col2"
echo "${COLS[3]}"; # prints "col3"
echo "${COLS[4]}"; # prints "col4"

If you want to split not by white-space but by any other character, you can temporarily change the IFS variable which determines how Bash recognizes fields and word boundaries.

Re-compile a Debian kernel as a .deb package

Here is my success story on how to re-compile a Debian/Ubuntu kernel, in order to enable or tune kernel features which are not available as kernel modules:

# Install required software for the kernel compilation
apt-get install fakeroot build-essential devscripts
apt-get build-dep linux-image-$(uname -r) # make sure you have the appropriate "deb-src" in "sources.list"
apt-get install libncurses5-dev # required for "make menuconfig"
apt-get install ccache # to re-compile the kernel faster (http://wiki.debian.org/OverridingDSDT)

# Prepare some environent variables for our architecture, for later use
ARCH=$(uname -r|cut -d- -f3)
CPUCNT=$(( $(cat /proc/cpuinfo |egrep ^processor |wc -l) * 2))

# Get the kernel sources
rm -rf /root/krebuild && mkdir /root/krebuild
cd /root/krebuild
apt-get source linux-image-$(uname -r)
cd linux-$(uname -r|cut -d- -f1|cut -d. -f1-2)* # cd linux-3.2.20

# http://kernel-handbook.alioth.debian.org/ch-common-tasks.html # 4.2.5 Building packages for one flavour
# The target in this command has the general form of target_arch_featureset_flavour. Replace the featureset with none if you do not want any of the extra featuresets.

# Prepare a Debian kernel to compile
fakeroot make -f debian/rules.gen setup_${ARCH}_none_${ARCH} >/dev/null
cd debian/build/build_${ARCH}_none_${ARCH}
make menuconfig # make any kernel config changes now
cd ../../..

# No debug info => faster kernel build
perl -pi -e 's/debug-info:\s+true/debug-info: false/' debian/config/$ARCH/defines
echo binary-arch_${ARCH}_none_${ARCH}
vi debian/rules.gen # find the Make target and change DEBUG and DEBUG_INFO to False/n respectively

# Bugfix: http://lists.debian.org/debian-user/2008/02/msg01455.html
vi debian/bin/buildcheck.py +51 # add "return 0" right after "def __call__(self, out):"

# Compile the kernel
time DEBIAN_KERNEL_USE_CCACHE=true DEBIAN_KERNEL_JOBS=$CPUCNT \
	fakeroot make -j$CPUCNT -f debian/rules.gen binary-arch_${ARCH}_none_${ARCH} > compile-progress.log

# If needed, the linux-headers-version-common binary package (http://kernel-handbook.alioth.debian.org/ch-common-tasks.html -> 4.2.5)
#fakeroot make -j$CPUCNT -f debian/rules.gen binary-arch_${ARCH}_none_real

# Install the newly compiled kernel
cd ..
dpkg -i linux-image-*.deb
#dpkg -i linux-headers-*.deb # only if you need them and/or have them installed already

Perl Net::Ping not working properly with ICMP by default

If you tried to ping a host with Perl Net::Ping using the ICMP protocol and that failed, even though the “ping” command-line utility can ping the host, you’re not alone :) I had the same problem and it turned out to be due to the fact that Net::Ping by default sends no DATA in the ICMP request and thus its requests are rather short and non-standard. Here are some tcpdump examples:

  • Net::Ping with ICMP protocol, everything else is defaults: “$p = new Net::Ping(‘icmp’)“, no replies from remote host, note that the length is just 8 bytes:
    12:29:02.898083 IP source-addr > source-addr: ICMP echo request, id 2194, seq 41, length 8
    12:29:03.711595 IP source-addr > dest-addr: ICMP echo request, id 2194, seq 42, length 8
    
  • Linux “ping” command-line utility, remote host replies accordingly, the length is 64 bytes total:
    12:30:18.278865 IP source-addr > dest-addr: ICMP echo request, id 2488, seq 1, length 64
    12:30:18.289922 IP dest-addr > source-addr: ICMP echo reply, id 2488, seq 1, length 64
    12:30:18.790610 IP source-addr > dest-addr: ICMP echo request, id 2488, seq 2, length 64
    12:30:18.811029 IP dest-addr > source-addr: ICMP echo reply, id 2488, seq 2, length 64
    
  • Net::Ping with ICMP protocol with user-defined length, “$p = new Net::Ping(‘icmp’, 1, 56)“, remote host replies accordingly, the length is 64 bytes total:
    12:30:48.377496 IP source-addr > dest-addr: ICMP echo request, id 2488, seq 6, length 64
    12:30:48.433690 IP dest-addr > source-addr: ICMP echo reply, id 2488, seq 6, length 64
    12:30:48.934310 IP source-addr > dest-addr: ICMP echo request, id 2488, seq 7, length 64
    12:30:48.946152 IP dest-addr > source-addr: ICMP echo reply, id 2488, seq 7, length 64
    

Bottom line is that if you are going to use Net::Ping with ICMP, specify 56 for the “bytes” parameter when creating an instance of the Net::Ping object. This way you will be sending standard ICMP requests with total length of 64 bytes.

Securely avoid SSH warnings for changing IP addresses

If you have servers that change their IP address, you’ve probably already been used to the following SSH warning:

The authenticity of host '176.34.91.245 (176.34.91.245)' can't be established.
...
Are you sure you want to continue connecting (yes/no)? yes

Besides from being annoying, it is also a security risk to blindly accept this warning and continue connecting. And be honest — almost none of us check the fingerprint in advance every time.

A common scenario for this use case is when you have an EC2 server in Amazon AWS which you temporarily stop and then start, in order to cut costs. I have a backup server which I use in this way.

In order to securely avoid this SSH warning and still be sure that you connect to your trusted server, you have to save the fingerprint in a separate file and update the IP address in it every time before you connect. Here are the connect commands, which you can also encapsulate in a Bash wrapper script:

IP=176.34.91.245 # use an IP address here, not a hostname
FPFILE=~/.ssh/aws-backup-server.fingerprint

test -e "$FPFILE" && perl -pi -e "s/^\S+ /$IP /" "$FPFILE"
ssh -o StrictHostKeyChecking=ask -o UserKnownHostsFile="$FPFILE" root@$IP

Note that the FPFILE is not required to exist on the first SSH connect. The first time you connect to the server, the FPFILE will be created when you accept the SSH warning. Further connects will not show an SSH warning or ask you to accept the fingerprint again.

iSCSI-over-Internet performance notes

I recently played a bit with iSCSI over Internet, in order to design and implement the Locally encrypted secure remote backup over Internet.

My initial impression was that iSCSI over Internet is not usable as a backup device even though my Internet connection is relatively fast — a simple ext4 file-system format took about 24 minutes. I though that the connection latency is killing the performance. Well, I was wrong. Even after making latency two times lower by working on a server which was geographically closer, the ext4 format still took 24 minutes.

Eventually I did some tests and analysis, and finally started to use the iSCSI over Internet volume for backup purposes — and it works flawlessly so far.

Ext4 format benchmark

It turns out that it’s not the latency but my upload bandwidth which was slowing things down:

  • 1 Mbit/s upload Internet connection and Ping latency of 75 ms:
    • Time: 24 minutes.
    • Average transfer rates snapshot:
      • Total rates: 967.7 kbits/sec (212.6 packets/sec).
      • Incoming rates: 83.0 kbits/sec (92.8 packets/sec).
      • Outgoing rates: 884.6 kbits/sec (119.8 packets/sec).
    • About 200 MBytes outgoing transfer; only 12 MBytes incoming transfer (no SSH tunnel compression).
    • About 200.000 packets sent and about 130.000 received.
  • 3 Mbit/s upload Internet connection and Ping latency of 75 ms:
    • Time: 8 minutes.
    • Average transfer rates snapshot:
      • Total rates: 2580.0 kbits/sec (417.8 packets/sec).
      • Incoming rates: 128.5 kbits/sec (149.6 packets/sec).
      • Outgoing rates: 2451.5 kbits/sec (268.2 packets/sec).
    • About 160 MBytes outgoing transfer; only 9 MBytes incoming transfer (with SSH tunnel compression).
    • About 140.000 packets sent and about 80.000 received.

I know I’m missing two tests with and without SSH tunnel compression but it seems compression doesn’t make such a difference. It’s upload speed which affects the total completion time.

File copy benchmark

All tests were done without SSH compression and we make the same conclusion — it is bandwidth which affects the total completion time:

  • 1 Mbit/s upload Internet connection and Ping latency of 75 ms:
    • SSH direct file copy to server: 100 seconds (11 MBytes file).
    • File copy to an iSCSI mounted file-system: 105 seconds.
  • 3 Mbit/s upload Internet connection and Ping latency of 75 ms:
    • SSH direct file copy to server: 39 seconds (11 MBytes file).
    • File copy to an iSCSI mounted file-system: 39 seconds.

The SSH direct file copy (SCP) transfer command was “scp testf root@172.18.0.1:/tmp/”, and the file copy command was “cp testf /mnt/ ; sync”.

Server and client load during transfer, other benchmarks

During the transfer both the client and server machines were almost idle in regards to CPU. The iSCSI block storage device on the server was not saturated even at 1%.

Note that the iSCSI target was exported via an SSH tunnel, as described here. Ping tests shown no difference between a direct server ping and a ping via the SSH tunnel.

The file copy tests were done on a regular iSCSI mounted volume, and on an iSCSI volume which was encrypted using TrueCrypt. The same speeds were achieved.

Encountered problems

During the backup runs, I got several of the following kernel messages in “dmesg”. This seems like a normal warning for the iSCSI use-case scenario:

[13200.272157] INFO: task jbd2/dm-0-8:1931 blocked for more than 120 seconds.
[13200.272164] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
[13200.272168] jbd2/dm-0-8 D f2abdc80 0 1931 2 0×00000000