/contrib/famzah

Enthusiasm never stops


Leave a comment

mpssh-py — half a million executions for 3 years

It’s “mpssh-py” 3rd birthday this year! 🙂

I did some stats analysis on my logs and they revealed the following:

  • 500.000 SSH executions by “mpssh-py” on my local machine for the last 3 years.
  • No bugs for the last 3 years.

This is a proof for the power of interpreted languages and that the single-responsibility design approach for programs helps in making them more reliable.

Font Awesome by Dave Gandy - http://fontawesome.io


47 Comments

C++ vs. Python vs. Perl vs. PHP performance benchmark (2016)

There are newer benchmarks: C++ vs. Python vs. PHP vs. Java vs. Others performance benchmark (2016 Q3)

The benchmarks here do not try to be complete, as they are showing the performance of the languages in one aspect, and mainly: loops, dynamic arrays with numbers, basic math operations.

This is a redo of the tests done in previous years. You are strongly encouraged to read the additional information about the tests in the article.

Here are the benchmark results:

Language CPU time Slower than Language
version
Source
code
User System Total C++ previous
C++ (optimized with -O2) 0.952 0.172 1.124 g++ 5.3.1 link
Java 8 (non-std lib) 1.332 0.096 1.428 27% 27% 1.8.0_72 link
Python 2.7 + PyPy 1.560 0.160 1.720 53% 20% PyPy 4.0.1 link
Javascript (nodejs) 1.524 0.516 2.040 81% 19% 4.2.6 link
C++ (not optimized) 2.988 0.168 3.156 181% 55% g++ 5.3.1 link
PHP 7.0 6.524 0.184 6.708 497% 113% 7.0.2 link
Java 8 14.616 0.908 15.524 1281% 131% 1.8.0_72 link
Python 3.5 18.656 0.348 19.004 1591% 22% 3.5.1 link
Python 2.7 20.776 0.336 21.112 1778% 11% 2.7.11 link
Perl 25.044 0.236 25.280 2149% 20% 5.22.1 link
PHP 5.6 66.444 2.340 68.784 6020% 172% 5.6.17 link

The clear winner among the script languages is… PHP 7. 🙂

Yes, that’s not a mistake. Apparently the PHP team did a great job! The rumor that PHP 7 is really fast confirmed for this particular benchmark test. You can also review the PHP 7 infographic by the Zend Performance Team.

Brief analysis of the results:

  • NodeJS got almost 2x faster.
  • Java 8 seems almost 2x slower.
  • Python has no significant change in the performance. Every new release is a little bit faster but overall Python is steadily 15x slower than C++.
  • Perl has the same trend as Python and is steadily 22x slower than C++.
  • PHP 5.x is the slowest with results between 47x to 60x behind C++.
  • PHP 7 made the big surprise. It is about 10x faster than PHP 5.x, and about 3x faster than Python which is the next fastest script language.

The tests were run on a Debian Linux 64-bit machine.

You can download the source codes, an Excel results sheet, and the benchmark batch script at:
https://github.com/famzah/langs-performance

 


Leave a comment

Convert human-readable sizes back to raw numbers

Ever needed to convert lots of lines with 1M or 1G to their raw number representation?

Here is a sample:

$ cat sample
26140   132K   1.9G   1.5G     ?K     0K     8K     0K   5% mysqld
26140   132K   1.9G   1.5G     ?K     4K     8K     0K   5% mysqld
26140   132K   1.9G   1.5G     ?K     0K     0K     0K   5% mysqld
26140   132K   1.9G   1.5G     ?K    -8K     0K     0K   5% mysqld
26140   132K   1.9G   1.6G     ?K     0K    20K     0K   5% mysqld
26140   132K   1.9G   1.6G     ?K     0K    56K     0K   5% mysqld
26140   132K   1.9G   1.7G     ?K    -4K     4K     0K   5% mysqld
26140   132K   1.9G   1.7G     ?K     0K    16K     0K   5% mysqld
26140   132K   1.9G   1.8G     ?K     0K     0K     0K   5% mysqld

The following Perl one-liner comes to the rescue:

perl -Mstrict -Mwarnings -n -e 'my %p=( K=>3, M=>6, G=>9, T=>12); s/(\d+(?:\.\d+)?)([KMGT])/$1*10**$p{$2}/ge; print'

In the end you get:

$ cat sample | perl -Mstrict -Mwarnings -n -e 'my %p=( K=>3, M=>6, G=>9, T=>12); s/(\d+(?:\.\d+)?)([KMGT])/$1*10**$p{$2}/ge; print'
26140   132000   1900000000   1500000000     ?K     0     8000     0   5% mysqld
26140   132000   1900000000   1500000000     ?K     4000     8000     0   5% mysqld
26140   132000   1900000000   1500000000     ?K     0     0     0   5% mysqld
26140   132000   1900000000   1500000000     ?K    -8000     0     0   5% mysqld
26140   132000   1900000000   1600000000     ?K     0    20000     0   5% mysqld
26140   132000   1900000000   1600000000     ?K     0    56000     0   5% mysqld
26140   132000   1900000000   1700000000     ?K    -4000     4000     0   5% mysqld
26140   132000   1900000000   1700000000     ?K     0    16000     0   5% mysqld
26140   132000   1900000000   1800000000     ?K     0     0     0   5% mysqld

You can now paste this output to Excel, for example, in order to create a nice chart of it.


Leave a comment

Properly escape arbitrary data for JavaScript in an HTML page

I’ve encountered different techniques which (try to) solve this problem. Some of them escape only the single/double quotes, others sanitize the input by removing unexpected characters, etc. The solution should however be more general, and thus bullet proof.

We have no doubts on how to escape arbitrary data which we want displayed in an HTML page. We convert all special characters to HTML entities, and most programming languages have a function for that. In PHP that’s the htmlspecialchars() function. No developer writes their own version by substituting the ampersand character with “&”, for example, and so on.

Why re-invent the wheel when dealing with arbitrary data for JavaScript in an HTML page then. JavaScript expects data to be escaped in JSON — “Since JSON is a subset of JavaScript, it can be used in the language with no muss or fuss”.

The rules of thumb are:

  • When supplying arbitrary data to JavaScript, encode it as JSON. Let json_encode() put the opening and closing quotes.
  • If the JavaScript code is embedded in HTML code, the whole thing needs to be additionally HTML-escaped (converted to HTML entities).

Enough theory, let’s see the source code:

<?php
	$data = 'Any data, including <html tags>, \'"&;(){}'."\nNewline";
?>
<html>
<body>
	<script>
		// JavaScript not in HTML code, because we are inside a <script> block
		js_var1 = <?=json_encode($data)?>;
	</script>

	The input data is: <?=htmlspecialchars($data)?>
	<br><br>
	<a href="#" onclick="alert(<?=htmlspecialchars(json_encode($data))?>)">
		JavaScript in HTML code; supply data directly.
	</a>
	<br><br>
	<a href="#" onclick="alert(js_var1)">
		JavaScript in HTML code; supply data indirectly by using a JavaScript variable.
	</a>
</body>
</html>

The result seems a bit weird, even like a broken HTML, when we supply the data directly inside the HTML code:

<a href="#" onclick="alert(&quot;Any data, including &lt;html tags&gt;, '\&quot;&amp;;(){}\nNewline&quot;)">
	JavaScript in HTML code; supply data directly.
</a>

A side note: Make sure that for PHP you stay in UTF-8, because json_encode() requires this, and htmlspecialchars() also interprets encodings.

I’ll be glad to hear your comments or see an example where this method of escaping fails.


5 Comments

OpenSSH ciphers performance benchmark (update 2015)

It’s been five years since the last OpenSSH ciphers performance benchmark. There are two fundamentally new things to consider, which also gave me the incentive to redo the tests:

  • Since OpenSSH version 6.7 the default set of ciphers and MACs has been altered to remove unsafe algorithms. In particular, CBC ciphers and arcfour* are disabled by default. This has been adopted in Debian “Jessie”.
  • Modern CPUs have hardware acceleration for AES encryption.

I tested five different platforms having CPUs with and without AES hardware acceleration, different OpenSSL versions, and running on different platforms including dedicated servers, OpenVZ and AWS.

Since the processing power of each platform is different, I had to choose a criteria to normalize results, in order to be able to compare them. This was a rather confusing decision, and I hope that my conclusion is right. I chose to normalize against the “arcfour*”, “blowfish-cbc”, and “3des-cbc” speeds, because I doubt it that their implementation changed over time. They should run equally fast on each platform because they don’t benefit from the AES acceleration, nor anyone bothered to make them faster, because those ciphers are meant to be marked as obsolete for a long time.

A summary chart with the results follow:
openssh-ciphers-performance-2015-chart

You can download the raw data as an Excel file. Here is the command which was run on each server:

# uses "/root/tmp/dd.txt" as a temporary file!
for cipher in aes128-cbc aes128-ctr aes128-gcm@openssh.com aes192-cbc aes192-ctr aes256-cbc aes256-ctr aes256-gcm@openssh.com arcfour arcfour128 arcfour256 blowfish-cbc cast128-cbc chacha20-poly1305@openssh.com 3des-cbc ; do
	for i in 1 2 3 ; do
		echo
		echo "Cipher: $cipher (try $i)"
		
		dd if=/dev/zero bs=4M count=1024 2>/root/tmp/dd.txt | pv --size 4G | time -p ssh -c "$cipher" root@localhost 'cat > /dev/null'
		grep -v records /root/tmp/dd.txt
	done
done

We can draw the following conclusions:

  • Servers which run a newer CPU with AES hardware acceleration can enjoy the benefit of (1) a lot faster AES encryption using the recommended OpenSSH ciphers, and (2) some AES ciphers are now even two-times faster than the old speed champion, namely “arcfour”. I could get those great speeds only using OpenSSL 1.0.1f or newer, but this may need more testing.
  • Servers having a CPU without AES hardware acceleration still get two-times faster AES encryption with the newest OpenSSH 6.7 using OpenSSL 1.0.1k, as tested on Debian “Jessie”. Maybe they optimized something in the library.

Test results may vary (a lot) depending on your hardware platform, Linux kernel, OpenSSH and OpenSSL versions.


Leave a comment

“iperf” and “iftop” accuracy

While working on my latest pet project which involved 10 GigE transfers, I noticed a significant difference between the results shown by “iperf” and “iftop“. A fellow blogger also noticed this discrepancy. In order to get to the bottom of this, I did some additional tests using different MTU sizes, and observing the output of “iperf”, “iftop”, “iptraf”, and the raw Linux network device counters as seen by “ifconfig”.

The tests results are summarized in an online spreadsheet: https://goo.gl/MvJC8K
iperf vs. iftop vs. iptraf vs. raw stats - spreadsheet - preview

Some notes about each application:

  • iperf – this tool measures the TCP performance, as per documentation; therefore it counts the useful payload in a TCP/IP transfer; this is layer4 in the OSI model
  • iftop – this tool counts all IP packets, as per documentation; my tests show that it also operates on layer4, just as “iperf”, because ARP traffic (on layer3) is not counted at all; the fact that “iftop” cares about connections+ports also suggests that it operates at layer4
  • iptraf – this tool seems to be too old now, and its results were off by a multiple of 4 to 5
  • ifconfig – shows the most low-level statistics, namely bytes that passed as RX or TX through the network device; the most trusted source of performance data

We notice that both “iperf” and “iftop” measure the useful payload data that we can transfer per second. Since all OSI layers have some overhead, let’s take a look at what theory says about bandwidth efficiency in Ethernet:

  • with a standard MTU frame of 1500 bytes, we get 94.93% efficiency (5.07% overhead)
  • with a jumbo MTU frame of 9000 bytes, we get 99.14% efficiency (0.86% overhead)

Those numbers correspond very closely with the results shown by “iperf”.

It’s only “iftop” which differs a lot. Analysis of its source code reveals the reason for this and how we must interpret the displayed results:

#
# ui.c
#

void ui_print() {
...
    mvaddstr(y, COLS - 8 * HISTORY_DIVISIONS - 8, "rates:");

    draw_totals(&totals);
}

void draw_totals(host_pair_line* totals) {
    for(j = 0; j < HISTORY_DIVISIONS; j++) {
        readable_size((totals->sent[j] + totals->recv[j]) , buf, 10, 1024, options.bandwidth_in_bytes);
...
}

#
# ui_common.c
#

/*
 * Format a data size in human-readable format
 */
void readable_size(float n, char* buf, int bsize, int ksize, int bytes) {
    float size = 1;
...
    while(1) {
      size *= ksize;
...
        snprintf(buf, bsize, " %4.2f%s", n / size, bytes ? unit_bytes[i] : unit_bits[i]);

The authors of “iftop” decided to round to Gigibit (multiple of 1024), instead of the more common Gigabit (multiple of 1000). This makes the difference by “iftop” bigger as the transfer rate gets higher. For Gigabit the difference is 7%.

Once the “iftop” values are converted from Gigibit to Gigabit, they also match the results by “iperf” and the raw Linux network device counters.


5 Comments

Linux md-RAID scalability on a 10 Gigabit network

The question for today is – does Linux md-RAID scale to 10 Gbit/s?

I wanted to build a proof of concept for a scalable, highly available, fault tolerant, distributed block storage, which utilizes commodity hardware, runs on a 10 Gigabit Ethernet network, and uses well-tested open-source technologies. This is a simplified version of Ceph. The only single point of failure in this cluster is the client itself, which is inevitable in any solution.

Here is an overview diagram of the setup:
Linux md-RAID scalability on a 10 Gigabit network

My test lab is hosted on AWS:

  • 3x “c4.8xlarge” storage servers
    • each of them has 5x 50 GB General Purpose (SSD) EBS attached volumes which provide up to 160 MiB/s and 3000 IOPS for extended periods of time; practical tests shown 100 MB/s sustained sequential read/write performance per volume
    • each EBS volume is managed via LVM and there is one logical volume with size 15 GB
    • each 15 GB logical volume is being exported by iSCSI to the client machine
  • 1x “c4.8xlarge” client machine
    • the client machine initiates an iSCSI connection to each single 15 GB logical volume, and thus has 15 identical iSCSI block devices (3 storage servers x 5 block devices = 15 block devices)
    • to achieve a 3x replication factor, the block devices from each storage server are grouped into 5x mdadm software RAID-1 (mirror) devices; each RAID-1 device “md1” to “md5” contains three disks from a different storage server, so that if one or two of the storage servers fail, this won’t affect the operation of the whole RAID-1 device
    • all RAID-1 devices “md1” to “md5” are grouped into a single RAID-0 (stripe), in order to utilize the full bandwidth of all devices into a single block device, namely the “md99” RAID-0 device, which also combines the size capacity of all “md1” to “md5” devices and it equals to 75 GB
  • 10 Gigabit network in a VPC using Jumbo frames
  • the storage servers and the client machine were limited on boot to 4 CPUs and 2 GB RAM, in order to minimize the effect of the Linux disk cache
  • only sequential and random reading were benchmarked
  • Linux md RAID-1 (mirror) does not read from all underlying disks by default, so I had to create a RAID-1E (mirror) configuration; more info here and here; the “mdadm create” options follow: --level=10 --raid-devices=3 --layout=o3 Continue reading


Leave a comment

HP ProBook 450 G2 drivers

I recently changed my old laptop for a new HP ProBook 450 G2. The drivers installation on my Windows 7 Home Premium (64-bit) were pretty straight forward, except for the following two devices which were shown as not supported in the Device Manager:

  1. Other devices -> Unknown device:
    Hardware Ids: ACPI\HPQ6007
    Device Instance Path: ACPI\HPQ6007\3&33FD14CA&0
    Parent: ACPI\PNP0A08\2&daba3ff&1
    
  2. Other devices -> PCI Data Acquisition and Signal Processing Controller:
    Hardware Ids:
    	PCI\VEN_8086&DEV_9C24&SUBSYS_2248103C&REV_04
    	PCI\VEN_8086&DEV_9C24&SUBSYS_2248103C
    	PCI\VEN_8086&DEV_9C24&CC_118000
    	PCI\VEN_8086&DEV_9C24&CC_1180
    Compatible Ids:
    	PCI\VEN_8086&DEV_9C24&REV_04
    	PCI\VEN_8086&DEV_9C24
    	PCI\VEN_8086&CC_118000
    	PCI\VEN_8086&CC_1180
    	PCI\VEN_8086
    	PCI\CC_118000
    	PCI\CC_1180
    Physical Device Object name: \Device\NTPNP_PCI0014
    Device Instance Path: PCI\VEN_8086&DEV_9C24&SUBSYS_2248103C&REV_04\3&33FD14CA&0&FE
    Parent: ACPI\PNP0A08\2&daba3ff&1
    

The first one turned out to be the driver for “HP 3D DriveGuard 6” which I didn’t install because my SSD drive won’t benefit from such a feature.

The second one was a more interesting case. It turns out that not all HP drivers were listed when I selected my operating system “Microsoft Windows 7 Home Premium (64-bit)” at the HP drivers website. I had to change the OS to “Microsoft Windows 7 Home Professional (64-bit)”, so that I get the option to download the “Intel Chipset Installation Utility” which resolved the missing drivers.


Leave a comment

The “cp” command may corrupt your files on Debian Wheezy

We recently had two files corrupted on Debian Wheezy (the current “stable” release). The first one had some garbage, instead of the real data, the other had only zero characters. Only a small part of the files of about 3K was corrupted. This affects both “ext3” and “ext4” file-systems.

It turns out to be a free-memory read bug in cp from coreutils-[8.11..8.19] reported to GNU in Oct/2012. Almost a year ago it was also reported to Debian in Apr/2014 with severity “grave“.

Today we test if the bug is fixed using the PoC given in the original GNU bug report:

$ perl -e 'for (1..3333) { sysseek (*STDOUT, 4096, 1)' -e '&& syswrite (*STDOUT, "a" x 1024) or die "$!"}' > j

$ valgrind cp j j2

==13175== Memcheck, a memory error detector
==13175== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==13175== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==13175== Command: cp j j2
==13175== 
==13175== Invalid read of size 4
==13175==    at 0x8051229: ??? (in /bin/cp)
==13175==    by 0x153FFF: ???
==13175==  Address 0x424ed0c is 1,356 bytes inside a block of size 1,440 free'd
==13175==    at 0x40283EE: realloc (vg_replace_malloc.c:632)
==13175==    by 0x805820B: ??? (in /bin/cp)
==13175==    by 0x153FFF: ???

...

==15843== ERROR SUMMARY: 15 errors from 9 contexts (suppressed: 25 from 6)

It turns out that the bug is not fixed in Debian. Unfortunately, upgrade of the “coreutils” package from Jessie is not an option, where this bug is not present. The “coreutils” package from Jessie depends on a newer “libc6” and futhermore would introduce too many (untested) changes to the core utils.

Here is how to rebuild the “coreutils” package by applying the “cp” data corruption patch:

root@machine1:~# cowbuilder --login

COW-machine1:~# apt-get update
COW-machine1:~# apt-get upgrade

COW-machine1:~# mkdir /root/coreutils
COW-machine1:~# cd /root/coreutils

COW-machine1:~/coreutils# apt-get source coreutils
COW-machine1:~/coreutils# apt-get build-dep coreutils

COW-machine1:~/coreutils# cd coreutils-8.13
COW-machine1:~/coreutils/coreutils-8.13# wget 'http://git.savannah.gnu.org/cgit/coreutils.git/patch/?id=64aef5fb9afecc023a6e719da161dbbf450908b8' -O cp-avoid_data_corrupting_free_memory_read.patch

COW-machine1:~/coreutils/coreutils-8.13# patch -p1 < cp-avoid_data_corrupting_free_memory_read.patch
COW-machine1:~/coreutils/coreutils-8.13# DEBFULLNAME='Admin Team' DEBEMAIL='box@example.com' dch --local '~patched' 'Local build with cp data corruption patch'
COW-machine1:~/coreutils/coreutils-8.13# dpkg-buildpackage -b -rfakeroot

root@machine1:~# cp /var/cache/pbuilder/build/cow.1385/root/coreutils/coreutils_8.13-3.5~patched1_i386.deb /root/tmp/

Finally, you need to install the “.deb” file on your system and prevent APT from auto-upgrading it. You’d need to recompile it every time Debian “stable” releases a mainstream update for “cureutils”. This doesn’t happen that often. Furthermore, we hope that Debian will react to the bug report and will fix the bug in their source tree for Wheezy “stable”.


Leave a comment

BlackVue DR550GW-2CH dash camera samples

Here are a few uncompressed video samples from my BlackVue Wi-Fi 2 Channel DR550GW-2CH car recorder. Each two files represent a capture from the front and from the rear camera. I’ve tried to demonstrate the performance under different light conditions.

I’ve been using the camera in an always-on mode via the Power Magic PRO. This produces a footage even when the car is parked. So far I had no problems with the accumulator when leaving the car parked for a day or two.