/contrib/famzah

Enthusiasm never stops


18 Comments

C++ vs. Python vs. PHP vs. Java vs. Others performance benchmark (2016 Q3)

The benchmarks here do not try to be complete, as they are showing the performance of the languages in one aspect, and mainly: loops, dynamic arrays with numbers, basic math operations.

This is an improved redo of the tests done in previous years. You are strongly encouraged to read the additional information about the tests in the article.

Here are the benchmark results:

Language CPU time Slower than Language
version
Source
code
User System Total C++ previous
C++ (optimized with -O2) 0.915 0.058 0.973 g++ 6.1.1 link
Java 8 (non-std lib) 1.120 0.006 1.126 15% 15% 1.8.0_102 link
Python 2.7 + PyPy 1.386 0.128 1.514 55% 34% PyPy 5.4.0 link
Go 2.646 0.110 2.757 183% 82% 1.7 link
C++ (not optimized) 2.896 0.058 2.954 203% 7% g++ 6.1.1 link
PHP 7.0 6.568 0.171 6.739 592% 128% 7.0.10 link
Javascript (nodejs) 6.680 0.522 7.202 639% 6% 4.3.1 link
Java 8 (see notes) 12.123 0.078 12.200 1153% 69% 1.8.0_102 link
Ruby 12.912 0.235 13.147 1250% 7% 2.3.1 link
Python 3.5 17.753 0.142 17.895 1738% 36% 3.5.2 link
Python 2.7 23.565 0.126 23.691 2334% 32% 2.7.12 link
Perl 25.543 0.019 25.562 2526% 7% 5.22.2 link

The big difference this time is that we use a slightly modified benchmark method. Programs are no longer limited to just 10 loops. Instead they run for 90 wall-clock seconds, and then we divide and normalize their performance as if they were running for only 10 loops. This way we can compare with the previous results. The benefit of doing the tests like this is that the startup and shutdown times of the interpreters should make almost no difference now. It turned out that the new method doesn’t significantly change the outcome compared to the previous benchmark runs, which is good as the old way of benchmarks seems also correct.

For the curious readers, the raw results also show the maximum used memory (RSS).

Brief analysis of the results:

  • NodeJS version 4.3.x got much slower than the previous major version 4.2.x. This is the only surprise.
  • Python and Perl seem a bit slower than before but this is probably due to the fact that C++ performed even better because of the new benchmark method.
  • Java 8 didn’t perform much faster as we expected. Maybe it gets slower as more and more loops are done, which also allocated more RAM.
  • Also review the analysis in the old 2016 tests for more information.

The tests were run on a Debian Linux 64-bit machine.

You can download the source codes, raw results, and the benchmark batch script at:
https://github.com/famzah/langs-performance


22 Comments

C++ vs. Python vs. Perl vs. PHP performance benchmark (2016)

There are newer benchmarks: C++ vs. Python vs. PHP vs. Java vs. Others performance benchmark (2016 Q3)

The benchmarks here do not try to be complete, as they are showing the performance of the languages in one aspect, and mainly: loops, dynamic arrays with numbers, basic math operations.

This is a redo of the tests done in previous years. You are strongly encouraged to read the additional information about the tests in the article.

Here are the benchmark results:

Language CPU time Slower than Language
version
Source
code
User System Total C++ previous
C++ (optimized with -O2) 0.952 0.172 1.124 g++ 5.3.1 link
Java 8 (non-std lib) 1.332 0.096 1.428 27% 27% 1.8.0_72 link
Python 2.7 + PyPy 1.560 0.160 1.720 53% 20% PyPy 4.0.1 link
Javascript (nodejs) 1.524 0.516 2.040 81% 19% 4.2.6 link
C++ (not optimized) 2.988 0.168 3.156 181% 55% g++ 5.3.1 link
PHP 7.0 6.524 0.184 6.708 497% 113% 7.0.2 link
Java 8 14.616 0.908 15.524 1281% 131% 1.8.0_72 link
Python 3.5 18.656 0.348 19.004 1591% 22% 3.5.1 link
Python 2.7 20.776 0.336 21.112 1778% 11% 2.7.11 link
Perl 25.044 0.236 25.280 2149% 20% 5.22.1 link
PHP 5.6 66.444 2.340 68.784 6020% 172% 5.6.17 link

The clear winner among the script languages is… PHP 7.🙂

Yes, that’s not a mistake. Apparently the PHP team did a great job! The rumor that PHP 7 is really fast confirmed for this particular benchmark test. You can also review the PHP 7 infographic by the Zend Performance Team.

Brief analysis of the results:

  • NodeJS got almost 2x faster.
  • Java 8 seems almost 2x slower.
  • Python has no significant change in the performance. Every new release is a little bit faster but overall Python is steadily 15x slower than C++.
  • Perl has the same trend as Python and is steadily 22x slower than C++.
  • PHP 5.x is the slowest with results between 47x to 60x behind C++.
  • PHP 7 made the big surprise. It is about 10x faster than PHP 5.x, and about 3x faster than Python which is the next fastest script language.

The tests were run on a Debian Linux 64-bit machine.

You can download the source codes, an Excel results sheet, and the benchmark batch script at:
https://github.com/famzah/langs-performance

 


Leave a comment

Convert human-readable sizes back to raw numbers

Ever needed to convert lots of lines with 1M or 1G to their raw number representation?

Here is a sample:

$ cat sample
26140   132K   1.9G   1.5G     ?K     0K     8K     0K   5% mysqld
26140   132K   1.9G   1.5G     ?K     4K     8K     0K   5% mysqld
26140   132K   1.9G   1.5G     ?K     0K     0K     0K   5% mysqld
26140   132K   1.9G   1.5G     ?K    -8K     0K     0K   5% mysqld
26140   132K   1.9G   1.6G     ?K     0K    20K     0K   5% mysqld
26140   132K   1.9G   1.6G     ?K     0K    56K     0K   5% mysqld
26140   132K   1.9G   1.7G     ?K    -4K     4K     0K   5% mysqld
26140   132K   1.9G   1.7G     ?K     0K    16K     0K   5% mysqld
26140   132K   1.9G   1.8G     ?K     0K     0K     0K   5% mysqld

The following Perl one-liner comes to the rescue:

perl -Mstrict -Mwarnings -n -e 'my %p=( K=>3, M=>6, G=>9, T=>12); s/(\d+(?:\.\d+)?)([KMGT])/$1*10**$p{$2}/ge; print'

In the end you get:

$ cat sample | perl -Mstrict -Mwarnings -n -e 'my %p=( K=>3, M=>6, G=>9, T=>12); s/(\d+(?:\.\d+)?)([KMGT])/$1*10**$p{$2}/ge; print'
26140   132000   1900000000   1500000000     ?K     0     8000     0   5% mysqld
26140   132000   1900000000   1500000000     ?K     4000     8000     0   5% mysqld
26140   132000   1900000000   1500000000     ?K     0     0     0   5% mysqld
26140   132000   1900000000   1500000000     ?K    -8000     0     0   5% mysqld
26140   132000   1900000000   1600000000     ?K     0    20000     0   5% mysqld
26140   132000   1900000000   1600000000     ?K     0    56000     0   5% mysqld
26140   132000   1900000000   1700000000     ?K    -4000     4000     0   5% mysqld
26140   132000   1900000000   1700000000     ?K     0    16000     0   5% mysqld
26140   132000   1900000000   1800000000     ?K     0     0     0   5% mysqld

You can now paste this output to Excel, for example, in order to create a nice chart of it.


Leave a comment

Properly escape arbitrary data for JavaScript in an HTML page

I’ve encountered different techniques which (try to) solve this problem. Some of them escape only the single/double quotes, others sanitize the input by removing unexpected characters, etc. The solution should however be more general, and thus bullet proof.

We have no doubts on how to escape arbitrary data which we want displayed in an HTML page. We convert all special characters to HTML entities, and most programming languages have a function for that. In PHP that’s the htmlspecialchars() function. No developer writes their own version by substituting the ampersand character with “&”, for example, and so on.

Why re-invent the wheel when dealing with arbitrary data for JavaScript in an HTML page then. JavaScript expects data to be escaped in JSON — “Since JSON is a subset of JavaScript, it can be used in the language with no muss or fuss”.

The rules of thumb are:

  • When supplying arbitrary data to JavaScript, encode it as JSON. Let json_encode() put the opening and closing quotes.
  • If the JavaScript code is embedded in HTML code, the whole thing needs to be additionally HTML-escaped (converted to HTML entities).

Enough theory, let’s see the source code:

<?php
	$data = 'Any data, including <html tags>, \'"&;(){}'."\nNewline";
?>
<html>
<body>
	<script>
		// JavaScript not in HTML code, because we are inside a <script> block
		js_var1 = <?=json_encode($data)?>;
	</script>

	The input data is: <?=htmlspecialchars($data)?>
	<br><br>
	<a href="#" onclick="alert(<?=htmlspecialchars(json_encode($data))?>)">
		JavaScript in HTML code; supply data directly.
	</a>
	<br><br>
	<a href="#" onclick="alert(js_var1)">
		JavaScript in HTML code; supply data indirectly by using a JavaScript variable.
	</a>
</body>
</html>

The result seems a bit weird, even like a broken HTML, when we supply the data directly inside the HTML code:

<a href="#" onclick="alert(&quot;Any data, including &lt;html tags&gt;, '\&quot;&amp;;(){}\nNewline&quot;)">
	JavaScript in HTML code; supply data directly.
</a>

A side note: Make sure that for PHP you stay in UTF-8, because json_encode() requires this, and htmlspecialchars() also interprets encodings.

I’ll be glad to hear your comments or see an example where this method of escaping fails.


2 Comments

MemAvailable metric for Linux kernels before 3.14 in /proc/meminfo

A great new metric has been introduced in “/proc/meminfo” in the Linux 3.14 kernel — MemAvailable:

An estimate of how much memory is available for starting new applications, without swapping. Calculated from MemFree, SReclaimable, the size of the file LRU lists, and the low watermarks in each zone.

The estimate takes into account that the system needs some page cache to function well, and that not all reclaimable slab will be reclaimable, due to items being in use. The impact of those factors will vary from system to system.

I recommend that you read the kernel commit description for further details.

Since many people are still using Linux kernels before 3.14, I’ve backported this kernel patch to Perl. You can download the sources from GitHub: https://github.com/famzah/linux-memavailable-procfs

Many system administrators rely on the “free” tool to get a quick overview of the system’s memory usage. Unfortunately, the latest “procps” package still doesn’t interpret the “MemAvailable” metric, and even if it did, we don’t have it in Linux kernels before 3.14. Actually, the developers of the “procps-ng” package (which is the Debian, Fedora and openSUSE fork of “procps“) have reacted and did the same thing as me. For kernels before 3.14 they emulate the metric in the same way, and for kernels after 3.14, they display the native metric from “/proc/meminfo”. This makes my Perl port more or less redundant.

This is the reason I wrote a quick replacement of the “free” tool in Perl. A few examples of it follow.

Typical memory usage overview, in MBytes:

famzah@vbox:~$ ./free.pl -m
             total       used       free  anonymous     kernel     caches     others
Mem:          2488       1228       1259        608         24        580         15
  -/+ caches              648       1840
Swap:         1565          0       1565

Typical memory usage overview, in percentage:

famzah@vbox:~$ ./free.pl -mp
             total       used       free  anonymous     kernel     caches     others
Mem:          2488        49%        51%        24%         1%        23%         1%
  -/+ caches              26%        74%
Swap:         1565         0%       100%

Extended memory usage overview, in MBytes:

famzah@vbox:~$ ./free.pl -me
             total       used       free  anonymous     kernel     caches     others
Mem:          2488       1228       1260        608         24        580         14
  -/+ caches              647       1840
Swap:         1565          0       1565

Extended memory usage info:
  Buffers                  83
  Cached                  785
  SwapCached                0
  Shmem                   308
  AnonPages               300
  Mapped                  122
  Unevict+Mlocked           5
  Dirty+Writeback           0
  NFS+Bounce                0

Extended memory usage overview, in percentage:

famzah@vbox:~$ ./free.pl -mep
             total       used       free  anonymous     kernel     caches     others
Mem:          2488        49%        51%        24%         1%        23%         1%
  -/+ caches              26%        74%
Swap:         1565         0%       100%

Extended memory usage info:
  Buffers                  3%
  Cached                  32%
  SwapCached               0%
  Shmem                   12%
  AnonPages               12%
  Mapped                   5%
  Unevict+Mlocked          0%
  Dirty+Writeback          0%
  NFS+Bounce               0%

Memory logo


Leave a comment

Google App Engine Datastore benchmark

I admire the idea of Google App Engine — a platform as a service where there is “no worrying about DBAs, servers, sharding, and load balancers”. And you can “auto scale to 7 billion requests per day”. I wanted to try the App Engine for a pet project where I had to collect, process and query a huge amount of time series. The fact that I needed to do fast queries over tens of 1000’s of records however made me wonder if the App Engine Datastore would be fast enough. Note that in order to reduce the amount of entities which are fetched from the database, couples of data entries are consolidated into a single database entity. This however imposes another limitation — fetching big data entities uses more memory on the running instance.

My language of choice is Java, because its performance for such computations is great. I am using the the Objectify interface (version 4.0rc2), which is also one of the recommended APIs for the Datastore.

Unfortunately, my tests show that the App Engine is not suitable for querying of such amount of data. For example, fetching and updating 1000 entries takes 1.5 seconds, and additionally uses a lot of memory on the F1 instance. You can review the Excel sheet file below for more detailed results.

Basically each benchmark test performs the following operations and then exits:

  1. Adds a bunch of entries.
  2. Gets those entries from the database and verifies them.
  3. Updates those entries in the database.
  4. Gets the entries again from the database and verifies them.
  5. Deletes the entries.

All Datastore operations are performed in a batch and thus in an asynchronous parallel way. Furthermore, no indexes are used but the entities are referenced directly by their key, which is the most efficient way to query the Datastore. The tests were performed at two separate days because I wanted to extend some of the tests. This is indicated in the results. A single warmup request was made before the benchmarks, so that the App Engine could pre-load our application.

The first observation is that using the default F1 instance once we start fetching more than 100 entities or once we start to add/update/delete more than 1000 entities, we saturate the Datastore -> Objectify -> Java throughput and don’t scale any more:
App Engine Datastore median time per entity for 1 KB entities @ F1 instance

The other interesting observation is that the Datastore -> Objectify -> Java throughput depends a lot on the App Engine instance. That’s not a surprising fact because the application needs to serialize data back and forth when communicating with the Datastore. This requires CPU power. The following two charts show that more CPU power speeds up all operations where serializing is involved — that is all Datastore operations but the Delete one which only queries the Datastore by supplying the keys of the entities, no data:
App Engine Datastore times per entity for 1000 x 1 KB entities @ F1 instance

App Engine Datastore times per entity for 1000 x 1 KB entities @ F4 instance

It is unexpected that the App Engine and the Datastore still have good and bad days. Their latency as well as CPU accounting could fluctuate a lot. The following chart shows the benchmark results which we got using an F1 instance. If you compare this to the chart above where a much more expensive F4 instance was used, you’ll notice that the 4-times cheaper F1 instance performed almost as fast as an F4 instance:
App Engine Datastore times per entity for 1000 x 1 KB entities @ F1 instance (test on another day)

The source code and the raw results are available for download at http://www.famzah.net/download/gae-datastore-performance/


Leave a comment

C++: Use the right size for the right job

Do you want to speed up your C++ application up to 4 times? Read on.

Recently I happened to be on a summer holiday with a friend who is a C++ guru. We discussed my language performance benchmark post, which had just received a lot of attention in regards to C++. It took him a few hours to demonstrate something really important and often underestimated by developers — that we need to allocate only as much memory as needed, and nothing more.

The C++ algorithm which we use at the benchmark blog page uses an array of “int” (which on Linux also happens to be 4 bytes == 32 bits). However, if you review the code you’ll notice that the “int” value is used as a pure boolean variable for the given array index. So my friend proposed the following optimized version, which uses 1 bit of memory for the boolean value and thus doesn’t waste the other 31-bits left if we used “int” instead:

--- primes.cpp	2013-06-17 17:16:09.000000000 +0300
+++ primes-bool.cpp	2013-06-17 18:43:28.000000000 +0300
@@ -12,9 +12,9 @@
 		res.push_back(2);
 		return res;
 	}
-	vector<int> s;
+	vector<bool> s;
 	for (int i = 3; i < n + 1; i += 2) {
-		s.push_back(i);
+		s.push_back(true);
 	}
 	int mroot = sqrt(n);
 	int half = (int)s.size();
@@ -23,9 +23,9 @@
 	while (m <= mroot) {
 		if (s[i]) {
 			int j = (int)((m*m - 3)/2);
-			s[j] = 0;
+			s[j] = false;
 			while (j < half) {
-				s[j] = 0;
+				s[j] = false;
 				j += m;
 			}
 		}
@@ -33,9 +33,9 @@
 		m = 2*i + 3;
 	}
 	res.push_back(2);
-	for (vector<int>::iterator it = s.begin() ; it < s.end(); ++it) {
-		if (*it) {
-			res.push_back(*it);
+	for (size_t it = 0; it < s.size(); ++it) {
+		if (s[it]) {
+			res.push_back(2*it + 3);
 		}
 	}

The end result is that the original C++ implementation finishes for 18.913 CPU seconds, while the memory-optimized completes in 5.354 seconds (28%). Impressive!

The bottom line is that even RAM is very fast and memory allocation is efficient, using only as few memory as really needed could vastly improve the performance of your C++ application. Which applies for programs developed in any language.