/contrib/famzah

Enthusiasm never stops


52 Comments

C++ vs. Python vs. PHP vs. Java vs. Others performance benchmark (2016 Q3)

The benchmarks here do not try to be complete, as they are showing the performance of the languages in one aspect, and mainly: loops, dynamic arrays with numbers, basic math operations.

This is an improved redo of the tests done in previous years. You are strongly encouraged to read the additional information about the tests in the article.

Here are the benchmark results:

Language CPU time Slower than Language
version
Source
code
User System Total C++ previous
C++ (optimized with -O2) 0.899 0.053 0.951 g++ 6.1.1 link
Rust 0.898 0.129 1.026 7% 7% 1.12.0 link
Java 8 (non-std lib) 1.090 0.006 1.096 15% 6% 1.8.0_102 link
Python 2.7 + PyPy 1.376 0.120 1.496 57% 36% PyPy 5.4.1 link
C# .NET Core Linux 1.583 0.112 1.695 78% 13% 1.0.0-preview2 link
Javascript (nodejs) 1.371 0.466 1.837 93% 8% 4.3.1 link
Go 2.622 0.083 2.705 184% 47% 1.7.1 link
C++ (not optimized) 2.921 0.054 2.975 212% 9% g++ 6.1.1 link
PHP 7.0 6.447 0.178 6.624 596% 122% 7.0.11 link
Java 8 (see notes) 12.064 0.080 12.144 1176% 83% 1.8.0_102 link
Ruby 12.742 0.230 12.972 1263% 6% 2.3.1 link
Python 3.5 17.950 0.126 18.077 1800% 39% 3.5.2 link
Perl 25.054 0.014 25.068 2535% 38% 5.24.1 link
Python 2.7 25.219 0.114 25.333 2562% 1% 2.7.12 link

The big difference this time is that we use a slightly modified benchmark method. Programs are no longer limited to just 10 loops. Instead they run for 90 wall-clock seconds, and then we divide and normalize their performance as if they were running for only 10 loops. This way we can compare with the previous results. The benefit of doing the tests like this is that the startup and shutdown times of the interpreters should make almost no difference now. It turned out that the new method doesn’t significantly change the outcome compared to the previous benchmark runs, which is good as the old way of benchmarks seems also correct.

For the curious readers, the raw results also show the maximum used memory (RSS).

Brief analysis of the results:

  • Rust, which we benchmark for the first time, is very fast. πŸ™‚
  • C# .NET Core on Linux, which we also benchmark for the first time, performs very well by being as fast as NodeJS and only 78% slower than C++. Memory usage peak was at 230 MB which is the same as Python 3.5 and PHP 7.0, and two times less than Java 8 and NodeJS.
  • NodeJS version 4.3.x got much slower than the previous major version 4.2.x. This is the only surprise. It turned out to be a minor glitch in the parser which was easy to fix. NodeJS 4.3.x is performing the same as 4.2.x.
  • Python and Perl seem a bit slower than before but this is probably due to the fact that C++ performed even better because of the new benchmark method.
  • Java 8 didn’t perform much faster as we expected. Maybe it gets slower as more and more loops are done, which also allocated more RAM.
  • Also review the analysis in the old 2016 tests for more information.

The tests were run on a Debian Linux 64-bit machine.

You can download the source codes, raw results, and the benchmark batch script at:
https://github.com/famzah/langs-performance

Update @ 2016-10-15: Added the Rust implementation. The minor versions of some languages were updated as well.
Update @ 2016-10-19: A redo which includes the NodeJS fix.
Update @ 2016-11-04: Added the C# .NET Core implementation.


47 Comments

C++ vs. Python vs. Perl vs. PHP performance benchmark (2016)

There are newer benchmarks: C++ vs. Python vs. PHP vs. Java vs. Others performance benchmark (2016 Q3)

The benchmarks here do not try to be complete, as they are showing the performance of the languages in one aspect, and mainly: loops, dynamic arrays with numbers, basic math operations.

This is a redo of the tests done in previous years. You are strongly encouraged to read the additional information about the tests in the article.

Here are the benchmark results:

Language CPU time Slower than Language
version
Source
code
User System Total C++ previous
C++ (optimized with -O2) 0.952 0.172 1.124 g++ 5.3.1 link
Java 8 (non-std lib) 1.332 0.096 1.428 27% 27% 1.8.0_72 link
Python 2.7 + PyPy 1.560 0.160 1.720 53% 20% PyPy 4.0.1 link
Javascript (nodejs) 1.524 0.516 2.040 81% 19% 4.2.6 link
C++ (not optimized) 2.988 0.168 3.156 181% 55% g++ 5.3.1 link
PHP 7.0 6.524 0.184 6.708 497% 113% 7.0.2 link
Java 8 14.616 0.908 15.524 1281% 131% 1.8.0_72 link
Python 3.5 18.656 0.348 19.004 1591% 22% 3.5.1 link
Python 2.7 20.776 0.336 21.112 1778% 11% 2.7.11 link
Perl 25.044 0.236 25.280 2149% 20% 5.22.1 link
PHP 5.6 66.444 2.340 68.784 6020% 172% 5.6.17 link

The clear winner among the script languages is… PHP 7. πŸ™‚

Yes, that’s not a mistake. Apparently the PHP team did a great job! The rumor that PHP 7 is really fast confirmed for this particular benchmark test. You can also review the PHP 7 infographic by the Zend Performance Team.

Brief analysis of the results:

  • NodeJS got almost 2x faster.
  • Java 8 seems almost 2x slower.
  • Python has no significant change in the performance. Every new release is a little bit faster but overall Python is steadily 15x slower than C++.
  • Perl has the same trend as Python and is steadily 22x slower than C++.
  • PHP 5.x is the slowest with results between 47x to 60x behind C++.
  • PHP 7 made the big surprise. It is about 10x faster than PHP 5.x, and about 3x faster than Python which is the next fastest script language.

The tests were run on a Debian Linux 64-bit machine.

You can download the source codes, an Excel results sheet, and the benchmark batch script at:
https://github.com/famzah/langs-performance

 


Leave a comment

Properly escape arbitrary data for JavaScript in an HTML page

I’ve encountered different techniques which (try to) solve this problem. Some of them escape only the single/double quotes, others sanitize the input by removing unexpected characters, etc. The solution should however be more general, and thus bullet proof.

We have no doubts on how to escape arbitrary data which we want displayed in an HTML page. We convert all special characters to HTML entities, and most programming languages have a function for that. In PHP that’s the htmlspecialchars() function. No developer writes their own version by substituting the ampersand character with “&”, for example, and so on.

Why re-invent the wheel when dealing with arbitrary data for JavaScript in an HTML page then. JavaScript expects data to be escaped in JSON — “Since JSON is a subset of JavaScript, it can be used in the language with no muss or fuss”.

The rules of thumb are:

  • When supplying arbitrary data to JavaScript, encode it as JSON. Let json_encode() put the opening and closing quotes.
  • If the JavaScript code is embedded in HTML code, the whole thing needs to be additionally HTML-escaped (converted to HTML entities).

Enough theory, let’s see the source code:

<?php
	$data = 'Any data, including <html tags>, \'"&;(){}'."\nNewline";
?>
<html>
<body>
	<script>
		// JavaScript not in HTML code, because we are inside a <script> block
		js_var1 = <?=json_encode($data)?>;
	</script>

	The input data is: <?=htmlspecialchars($data)?>
	<br><br>
	<a href="#" onclick="alert(<?=htmlspecialchars(json_encode($data))?>)">
		JavaScript in HTML code; supply data directly.
	</a>
	<br><br>
	<a href="#" onclick="alert(js_var1)">
		JavaScript in HTML code; supply data indirectly by using a JavaScript variable.
	</a>
</body>
</html>

The result seems a bit weird, even like a broken HTML, when we supply the data directly inside the HTML code:

<a href="#" onclick="alert(&quot;Any data, including &lt;html tags&gt;, '\&quot;&amp;;(){}\nNewline&quot;)">
	JavaScript in HTML code; supply data directly.
</a>

A side note: Make sure that for PHP you stay in UTF-8, because json_encode() requires this, and htmlspecialchars() also interprets encodings.

I’ll be glad to hear your comments or see an example where this method of escaping fails.


4 Comments

Parse XML into a PHP array

There are many different examples on how to parse an XML document into an array with PHP. What mine is different with is that it:

  • is very memory efficient by using PHP references (similar to pointers in C)
  • uses no recursion, thus there is no limit on the XML subtree levels
  • is very strict and paranoid about correctness

The parsing is done using XML Parser.

An example input XML data follows:

<?xml version="1.0" encoding="ISO-8859-1"?>
<root>
	<first_item>Test 1st item</first_item>
	<first_level_nested>
		<item idx="0">value #1</item>
		<item idx="1">value #2</item>
		<second_level_nested>
			<item idx="0">value #3</item>
			<item idx="1">value #4</item>
		</second_level_nested>
	</first_level_nested>
	<second_item>Test 2nd item</second_item>
</root>

There is one specific hack here. Since XML allows it to have an element with the same name multiple times on the same subtree level (see <item> on lines #05, #06, #08, #09), and at the same time it does not allow to have an element with only numeric name, we need to make the following exception for arrays which have numeric indexes:

  • If an element is named <item>, and it has an attribute named “idx”, then we will use this attribute as name, and respectively array key.

This is handled in the XmlCallback() class, method startElement(), lines #44, #45, #46, which are also highlighted. You can see the sources at the end of the article.

XML also allows it that an element contains both DATA and sub-elements. This cannot be parsed into a PHP array, and will result in an Exception.

The parsed PHP array would look like as follows:

Array
(
	[root] => Array
	(
		[first_item] => Test 1st item
		[first_level_nested] => Array
		(
			[0] => value #1
			[1] => value #2
			[second_level_nested] => Array
			(
				[0] => value #3
				[1] => value #4
			)

		)

		[second_item] => Test 2nd item
	)

)

If you liked the results, you can download the sources which follow (click “show source” below):

<?php

function xml_decode($output) {
	$xml_parser = xml_parser_create();
	$xml_callback = new XmlCallback();
	
	if (!xml_set_element_handler(
		$xml_parser,
		array($xml_callback, 'startElement'),
		array($xml_callback, 'endElement')
	)) throw new Exception('xml_set_element_handler() failed');
	if (!xml_set_character_data_handler($xml_parser, array($xml_callback, 'data'))) {
		throw new Exception('xml_set_character_data_handler() failed');
	}
	if (!xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, 0)) {
		throw new Exception('xml_parser_set_option() failed');
	}
	
	if (!xml_parse($xml_parser, $output, TRUE)) {
		$xml_error = sprintf(
			"%s at line %d",
			xml_error_string(xml_get_error_code($xml_parser)),
			xml_get_current_line_number($xml_parser)
		);
		throw new Exception("XML error: $xml_error\nXML data: $output");
	}
	
	xml_parser_free($xml_parser);
	
	return $xml_callback->getResult();
}

class XmlCallback {
	private $ret = null;
	/* assign and use references directly to the array, or else you'll be in trouble */
	private $ptr_stack = array();
	private $level = 0;

	public function __construct() {
		$this->ptr_stack[$this->level] =& $this->ret;
	}

	public function startElement($parser, $name, $attrs) {
		if ($name == 'item' && isset($attrs['idx'])) {
			$name = $attrs['idx']; /* reconstruct arrays with numeric indexes */
		}

		if (!isset($this->ptr_stack[$this->level])) {
			$this->ptr_stack[$this->level] = array();
			$this->ptr_stack[$this->level][$name] = null;
		} else {
			if (!is_array($this->ptr_stack[$this->level])) {
				if (!strlen(trim($this->ptr_stack[$this->level]))) {
					/* if until now we got only whitespace (thus scalar data),
					but now we start a nested elements structure, discard this
					whitespace, as it is most probably just space between the
					element tags */
					$this->ptr_stack[$this->level] = array();
				} else {
					throw new Exception('Mixed array and scalar data');
				}
			}
			if (isset($this->ptr_stack[$this->level][$name])) {
				/* isset() == (isset() && !is_null()) */
				throw new Exception("Duplicate element name: $name");
			}
		}

		/* array_push() */
		++$this->level;
		$this->ptr_stack[$this->level] =& $this->ptr_stack[$this->level-1 /* MINUS ONE! */][$name];
	}

	public function endElement($parser, $name) {
		if (!array_key_exists($this->level, $this->ptr_stack)) {
			throw new Exception('XML non-existing reference');
		}

		/* array_pop() */
		unset($this->ptr_stack[$this->level]);
		--$this->level;

		if ($this->level < 0) throw new Exception('XML stack underflow');
	}

	public function data($parser, $data) {
		if (is_array($this->ptr_stack[$this->level])) {
			if (strlen(trim($data))) { # check if this is just whitespace
				throw new Exception('Mixed array and scalar data');
			} else {
				/* we tolerate AND skip whitespace, if we're already in
				a nested elements structure, as this whitespece is most
				probably just space between the element tags */
				return;
			}
		}
		if (is_null($this->ptr_stack[$this->level])) {
			$this->ptr_stack[$this->level] = ''; /* first data input */
		}
		$this->ptr_stack[$this->level] .= $data; /* we may be called several times, in chunks */
	}

	public function getResult() {
		return $this->ret;
	}
}

Update, 20/Jul/2011: The source code was modified to handle white-space better, in order to fix the following tricky sample XML input: <item6> &amp; &lt; </item6>

Update, 30/Jul/2011: Another bugfix which handles empty responses like: <response/>


References:


2 Comments

Testing exception message with PHPUnit

PHPUnit has a built-in method to test if an expected exception occurred during a test case:

$this->setExpectedException('Exception');

You cannot however test the message of the exception. There are cases where a program may throw the same exception type, but with different messages for different errors, and you want to differentiate between them.

Here is my example code on how to reliably test for the type and message of an exception:

class staticSessionTest extends PHPUnit_Framework_TestCase {
	...
	function test_bad_data() {
		$emess = null;
		try {
			$this->sess->start('must be array', FALSE, FALSE); # we expect an Exception here
		} catch (Exception $e) { $emess = $e->getMessage(); }
		$this->assertEquals($emess, 'Session data must be an array');
	}
	...
}

Putting the assertEquals() outside of the try…catch block ensures that you cannot forget to test for the message. The type of the exception is coded inside the catch(…) block.


UPDATE: I just re-read the latest PHPUnit Annotations, and this feature is already included in the standard PHPUnit suite. The difference between my custom code and the “@expectedExceptionMessage” annotation is that the annotation is valid for the whole test block of execution, while using try…catch you can specify precisely where you expect the exception to occur.


References:


Leave a comment

PHP non-interactive usage in a cron job

Using a PHP script in a crontab is fairly easy, as stated in the “Using PHP from the command line” documentation… Until you start to get the following warning during the execution:

No entry for terminal type “unknown”;
using dumb terminal settings.

The script works, but this nasty warning really bothers you.

Here is a sample crontab entry:

* * * * * root sudo -u www-data php -r ‘echo “test”;’

When executed, it prints the warning on STDERR.

Yes, I know I don’t need “sudo” here, but this was my initial usage pattern as I discovered the problem, and at the first time I suspected that “sudo” got crazy. Well, it wasn’t “sudo” to blame, but PHP.

Here is the fixed crontab entry:

* * * * * root sudo -u www-data TERM=dumb php -r ‘echo “test”;’

The issue was encountered on an Ubuntu 10.04 server. I though crond usually sets $TERM to something… Anyway, problem solved.


6 Comments

C++ vs. Python vs. Perl vs. PHP performance benchmark (part #2)

This time we will focus on the startup time. The process start time is important if your processes are not persistent. If you are using FastCGI, mod_perl, mod_php, or mod_python, then these statistics are not so important to you. However, if you are spawning many processes which do something small and live for a very short time, then you should consider the CPU resources which get wasted while the script interpreter is being initialized.

The benchmarked scripts do only one thing – say “Hello, world” on the standard output. They do not include any additional modules in their source code – this may, or may not be your use-case. Though, very often the scripting languages have pretty many built-in functions, and for simple tasks you never need to include other modules.

Here are the benchmark results:

Language CPU time Slower than
User System Total C++ previous
C++ (with or w/o optimization) 2.568 3.536 6.051
Perl 12.561 6.096 18.723 209% 209%
PHP (w/o php.ini) 20.473 13.877 34.918 477% 86%
Python 27.014 11.881 39.318 550% 13%
Python + Psyco 32.986 14.845 48.132 695% 22%

The clear winner among the script languages this time is… Perl. πŸ™‚

All scripts were invoked 3000 times using the following Bash loop:

time ( i=3000 ; while [ “$i” -gt 0 ]; do $CMD >/dev/null ; i=$(($i-1)); done )

All tests were done on a Kubuntu Lucid box. The versions of the used software packages follow:

  • g++ (GNU project C and C++ compiler) 4.4.3
  • Python 2.6.5
  • Python Psyco 1.6 (1ubuntu2)
  • Perl 5.10.1
  • PHP 5.3.2 (1ubuntu4.2 with Suhosin-Patch), Zend Engine 2.3.0

The C++ implementation follows, click “show source” below to see the full source:

#include <iostream>
using namespace std;

int main() {
	cout << "Hello, world!\n";
	return 0;
}

The Perl implementation follows, click “show source” below to see the full source:

use strict;
use warnings;

print "Hello, world!\n";

The PHP implementation follows, click “show source” below to see the full source:

<?php
echo "Hello, world!\n";

The Python implementation follows, click “show source” below to see the full source:

#import psyco
#psyco.full()

print 'Hello, world!'

Update (Jan/14/2012): Copied the used test environment info here.


67 Comments

C++ vs. Python vs. Perl vs. PHP performance benchmark

Update: There areΒ newer benchmark results.


This all began as a colleague of mine stated that Python was so damn slow for maths. Which really astonished me and made me check it out, as my father told me once that he was very satisfied with Python, as it was very maths oriented.

The benchmarks here do not try to be complete, as they are showing the performance of the languages in one aspect, and mainly: loops, dynamic arrays with numbers, basic math operations.

Out of curiosity, Python was also benchmarked with and without the Psyco Python extension (now obsoleted by PyPy), which people say could greatly speed up the execution of any Python code without any modifications.

Here are the benchmark results:

Language CPU time Slower than Language
version
Source
code
User System Total C++ previous
C++ (optimized with -O2) 1,520 0,188 1,708 g++ 4.5.2 link
Java (non-std lib) 2,446 0,150 2,596 52% 52% 1.6.0_26 link
C++ (not optimized) 3,208 0,184 3,392 99% 31% g++ 4.5.2 link
Javascript (SpiderMonkey) see comment (SpiderMonkey seems as fast as C++ on Windows)
Javascript (nodejs) 4,068 0,544 4,612 170% 36% 0.8.8 link
Java 8,521 0,192 8,713 410% 150% 1.6.0_26 link
Python + Psyco 13,305 0,152 13,457 688% 54% 2.6.6 link
Ruby see comment (Ruby seems 35% faster than standard Python)
Python 27,886 0,168 28,054 1543% 108% 2.7.1 link
Perl 41,671 0,100 41,771 2346% 49% 5.10.1 link
PHP 5.4 roga’s blog results (PHP 5.4 seems 33% faster than PHP 5.3)
PHP 5.3 94,622 0,364 94,986 5461% 127% 5.3.5 link

The clear winner among the script languages is… Python. πŸ™‚

NodeJS JavaScript is pretty fast too, but internally it works more like a compiled language. See the comments below.

Please read the discussion about Java which I had with Isaac Gouy. He accused me that I am not comparing what I say am comparing. And also that I did not want to show how slow and how fast the Java example program can be. You deserve the whole story, so please read it if you are interested in Java.

Both PHP and Python are taking advantage of their built-in range() function, because they have one. This speeds up PHP by 5%, and Python by 20%.

The times include the interpretation/parsing phase for each language, but it’s so small that its significance is negligible. The math function is called 10 times, in order to have more reliable results. All scripts are using the very same algorithm to calculate the prime numbers in a given range. The correctness of the implementation is not so important, as we just want to check how fast the languages perform. The original Python algorithm was taken from http://www.daniweb.com/code/snippet216871.html.

The tests were run on an Ubuntu Linux machine.

You can download the source codes, an Excel results sheet, and the benchmark batch script at:
http://www.famzah.net/download/langs-performance/


Update (Jul/24/2010): Added the C++ optimized values.
Update (Aug/02/2010): Added a link to the benchmarks, part #2.
Update (Mar/31/2011): Using range() in PHP improves performance with 5%.
Update (Jan/14/2012): Re-organized the results summary table and the page. Added Java.
Update (Apr/02/2012): Added a link to PHP 5.4 vs. PHP 5.3 benchmarks.
Update (May/29/2012): Added the results for Java using a non-standard library.
Update (Jun/25/2012): Made the discussion about Java public, as well as added a note that range() is used for PHP and Python.
Update (Aug/31/2012): Updated benchmarks for the latest node.js.
Update (Oct/24/2012): Added the results for SpiderMonkey JavaScript.
Update (Jan/11/2013): Added the results for Ruby vs. Python and Nodejs.