/contrib/famzah

Enthusiasm never stops


Leave a comment

Google Cloud API and Python

Confusion is what I got the first time I wanted to automate Google Cloud using Python. While the documentation of Google Cloud is not bad, it’s far from ideal. Let me try to clarify things if you’re struggling with this like I did.

In its very core, Google Cloud API is an HTTP REST service (or gRPC for selected APIs). And you have two ways of accessing the Google Cloud API using Python:

While the “Client Libraries” is Google’s recommended option, my own experience shows that you’d better use the second option which accesses the HTTP REST API directly. That is because:

  • The Google Cloud HTTP REST API is documented better. I guess the reason for that is because it’s used by all the languages, not only Python, and because Google developers use it internally, too.
  • The official documentation of the Google Cloud services always gives an example for the REST APIs along with the other options like the web console, gcloud/gsutil, and the “Client Libraries” (where they are applicable).
  • The Google Cloud web console interface gives the resulting HTTP REST API or command line code of what you populated in the forms online. You can see this at the very bottom of the page. Therefore, you can use the web interface to enter what you indent to do and then easily see what you need to send to the HTTP REST API, in order to achieve the same as you’d do if you clicked it online.
  • Not all Google Cloud APIs are wrapped in “Client Libraries” for a specific language like Python. For example, Compute Engine is not, so you would most probably need to use the HTTP REST API directly anyway. If that’s the case, why learn two libraries when you can learn only one and use only the HTTP REST API.

The installation of the Python library for Google Cloud direct HTTP REST API access is easy. There is documentation but the process is as straightforward as “pip install google-api-python-client”. You can then access any Google Cloud HTTP API and here is an oversimplified example:

import googleapiclient.discovery
compute = googleapiclient.discovery.build('compute', 'v1')
result = compute.regions().list(project='my_project_id', maxResults=2).execute()

Before you can execute the example, you have to set up your authentication. I personally find Authenticating as a service account very convenient but you can also review the other Google Cloud API authentication options.

Please note that even the simple Python library for Google Cloud direct HTTP REST API access has its minor peculiarities and is not a 1:1 mapping of the API. For example, when you call an update() API the HTTP REST API documentation names the selector field “resourceId” while the Python library’s update() method names this field “healthCheck”, for example for the healthChecks().update() method. Therefore, you always need to consult both documentations when you develop your scripts.

Here is what I keep in my bookmarks when working with the Google Cloud API using Python:

  • Google APIs Explorer — the core documentation of every service.
  • Python library for Google Cloud direct HTTP REST API access — I consult it for the named arguments of the methods if they differ from the official API (see the example in the previous paragraph). Additionally, if you use the list() methods, the results are usually paginated and you have to call list_next() which is well documented.


52 Comments

C++ vs. Python vs. PHP vs. Java vs. Others performance benchmark (2016 Q3)

The benchmarks here do not try to be complete, as they are showing the performance of the languages in one aspect, and mainly: loops, dynamic arrays with numbers, basic math operations.

This is an improved redo of the tests done in previous years. You are strongly encouraged to read the additional information about the tests in the article.

Here are the benchmark results:

Language CPU time Slower than Language
version
Source
code
User System Total C++ previous
C++ (optimized with -O2) 0.899 0.053 0.951 g++ 6.1.1 link
Rust 0.898 0.129 1.026 7% 7% 1.12.0 link
Java 8 (non-std lib) 1.090 0.006 1.096 15% 6% 1.8.0_102 link
Python 2.7 + PyPy 1.376 0.120 1.496 57% 36% PyPy 5.4.1 link
C# .NET Core Linux 1.583 0.112 1.695 78% 13% 1.0.0-preview2 link
Javascript (nodejs) 1.371 0.466 1.837 93% 8% 4.3.1 link
Go 2.622 0.083 2.705 184% 47% 1.7.1 link
C++ (not optimized) 2.921 0.054 2.975 212% 9% g++ 6.1.1 link
PHP 7.0 6.447 0.178 6.624 596% 122% 7.0.11 link
Java 8 (see notes) 12.064 0.080 12.144 1176% 83% 1.8.0_102 link
Ruby 12.742 0.230 12.972 1263% 6% 2.3.1 link
Python 3.5 17.950 0.126 18.077 1800% 39% 3.5.2 link
Perl 25.054 0.014 25.068 2535% 38% 5.24.1 link
Python 2.7 25.219 0.114 25.333 2562% 1% 2.7.12 link

The big difference this time is that we use a slightly modified benchmark method. Programs are no longer limited to just 10 loops. Instead they run for 90 wall-clock seconds, and then we divide and normalize their performance as if they were running for only 10 loops. This way we can compare with the previous results. The benefit of doing the tests like this is that the startup and shutdown times of the interpreters should make almost no difference now. It turned out that the new method doesn’t significantly change the outcome compared to the previous benchmark runs, which is good as the old way of benchmarks seems also correct.

For the curious readers, the raw results also show the maximum used memory (RSS).

Brief analysis of the results:

  • Rust, which we benchmark for the first time, is very fast. πŸ™‚
  • C# .NET Core on Linux, which we also benchmark for the first time, performs very well by being as fast as NodeJS and only 78% slower than C++. Memory usage peak was at 230 MB which is the same as Python 3.5 and PHP 7.0, and two times less than Java 8 and NodeJS.
  • NodeJS version 4.3.x got much slower than the previous major version 4.2.x. This is the only surprise. It turned out to be a minor glitch in the parser which was easy to fix. NodeJS 4.3.x is performing the same as 4.2.x.
  • Python and Perl seem a bit slower than before but this is probably due to the fact that C++ performed even better because of the new benchmark method.
  • Java 8 didn’t perform much faster as we expected. Maybe it gets slower as more and more loops are done, which also allocated more RAM.
  • Also review the analysis in the old 2016 tests for more information.

The tests were run on a Debian Linux 64-bit machine.

You can download the source codes, raw results, and the benchmark batch script at:
https://github.com/famzah/langs-performance

Update @ 2016-10-15: Added the Rust implementation. The minor versions of some languages were updated as well.
Update @ 2016-10-19: A redo which includes the NodeJS fix.
Update @ 2016-11-04: Added the C# .NET Core implementation.


47 Comments

C++ vs. Python vs. Perl vs. PHP performance benchmark (2016)

There are newer benchmarks: C++ vs. Python vs. PHP vs. Java vs. Others performance benchmark (2016 Q3)

The benchmarks here do not try to be complete, as they are showing the performance of the languages in one aspect, and mainly: loops, dynamic arrays with numbers, basic math operations.

This is a redo of the tests done in previous years. You are strongly encouraged to read the additional information about the tests in the article.

Here are the benchmark results:

Language CPU time Slower than Language
version
Source
code
User System Total C++ previous
C++ (optimized with -O2) 0.952 0.172 1.124 g++ 5.3.1 link
Java 8 (non-std lib) 1.332 0.096 1.428 27% 27% 1.8.0_72 link
Python 2.7 + PyPy 1.560 0.160 1.720 53% 20% PyPy 4.0.1 link
Javascript (nodejs) 1.524 0.516 2.040 81% 19% 4.2.6 link
C++ (not optimized) 2.988 0.168 3.156 181% 55% g++ 5.3.1 link
PHP 7.0 6.524 0.184 6.708 497% 113% 7.0.2 link
Java 8 14.616 0.908 15.524 1281% 131% 1.8.0_72 link
Python 3.5 18.656 0.348 19.004 1591% 22% 3.5.1 link
Python 2.7 20.776 0.336 21.112 1778% 11% 2.7.11 link
Perl 25.044 0.236 25.280 2149% 20% 5.22.1 link
PHP 5.6 66.444 2.340 68.784 6020% 172% 5.6.17 link

The clear winner among the script languages is… PHP 7. πŸ™‚

Yes, that’s not a mistake. Apparently the PHP team did a great job! The rumor that PHP 7 is really fast confirmed for this particular benchmark test. You can also review the PHP 7 infographic by the Zend Performance Team.

Brief analysis of the results:

  • NodeJS got almost 2x faster.
  • Java 8 seems almost 2x slower.
  • Python has no significant change in the performance. Every new release is a little bit faster but overall Python is steadily 15x slower than C++.
  • Perl has the same trend as Python and is steadily 22x slower than C++.
  • PHP 5.x is the slowest with results between 47x to 60x behind C++.
  • PHP 7 made the big surprise. It is about 10x faster than PHP 5.x, and about 3x faster than Python which is the next fastest script language.

The tests were run on a Debian Linux 64-bit machine.

You can download the source codes, an Excel results sheet, and the benchmark batch script at:
https://github.com/famzah/langs-performance

 


Leave a comment

Validator for the Model key_name property in Google App Engine datastore (Python)

The Google App Engine datastore provides convenient data modeling with Python. One important aspect is the validation of the data stored in a Model instance. Each data key-value is stored as a Property which is an attribute of a Model class.

While every Property can be validated automatically by specifying a “validator” function, there is no option for the Model key name to be automatically validated. Note that we can manually specify by our code the value of the key name, and therefore this key name can be considered user-data and must be validated. The key name is by the way the only unique index constraint, similar to the “primary key” in relational databases, which is supported by the Google datastore, and can be specified manually.

Here is my version for a validation function for the Model’s key name:

from google.appengine.ext import db
import re

def ModelKeyNameValidator(self, regexp_string, *args, **kwargs):
	gotKey = None
	className = self.__class__.__name__

	if len(args) >= 2:
		if gotKey: raise Exception('Found key for second time for Model ' + className)
		gotKey = 'args'
		k = args[1] # key_name given as an unnamed argument
	if 'key' in kwargs:
		if gotKey: raise Exception('Found key for second time for Model ' + className)
		gotKey = 'Key'
		k = kwargs['key'].name() # key_name given as Key instance
	if 'key_name' in kwargs:
		if gotKey: raise Exception('Found key for second time for Model ' + className)
		gotKey = 'key_name'
		k = kwargs['key_name'] # key_name given as a keyword argument

	if not gotKey:
		raise Exception('No key found for Model ' + className)

	id = '%s.key_name(%s)' % (self.__class__.__name__, gotKey)
	if (not re.search(regexp_string, k)):
		raise ValueError('(%s) Value "%s" is invalid. It must match the regexp "%s"' % (id, k, regexp_string))

class ClubDB(db.Model):
	# key = url
	def __init__(self, *args, **kwargs):
		ModelKeyNameValidator(self, '^[a-z0-9-]{2,32}$', *args, **kwargs)
		super(self.__class__, self).__init__(*args, **kwargs)

	name = db.StringProperty(required = True)

As you can see, the proposed solution is not versatile enough, and requires you to copy and alter the ModelKeyNameValidator() function again and again for every new validation type. I strictly follow the Don’t Repeat Yourself principle in programming, so after much Googling and struggling with Python, I got to the following solution which I actually use in my projects (click “show source” to see the code):

from google.appengine.ext import db
import re

def re_validator(id, regexp_string):
	def validator(v):
		string_type_validator(v)
		if (not re.search(regexp_string, v)):
			raise ValueError('(%s) Value "%s" is invalid. It must match the regexp "%s"' % (id, v, regexp_string))
	return validator

def length_validator(id, minlen, maxlen):
	def validator(v):
		string_type_validator(v)
		if minlen is not None and len(v) < minlen:
			raise ValueError('(%s) Value "%s" is invalid. It must be more than %s characters' % (id, v, minlen))
		if maxlen is not None and len(v) > maxlen:
			raise ValueError('(%s) Value "%s" is invalid. It must be less than %s characters' % (id, v, maxlen))
	return validator

def ModelKeyValidator(v, self, *args, **kwargs):
	gotKey = None

	if len(args) >= 2:
		if gotKey: raise Exception('Found key for second time for Model ' + self.__class__.__name__)
		gotKey = 'args'
		k = args[1] # key_name given as unnamed argument
	if 'key' in kwargs:
		if gotKey: raise Exception('Found key for second time for Model ' + self.__class__.__name__)
		gotKey = 'Key'
		k = kwargs['key'].name()
	if 'key_name' in kwargs:
		if gotKey: raise Exception('Found key for second time for Model ' + self.__class__.__name__)
		gotKey = 'key_name'
		k = kwargs['key_name']

	if not gotKey:
		raise Exception('No key found for Model ' + self.__class__.__name__)

	v.execute('%s.key_name(%s)' % (self.__class__.__name__, gotKey), k) # validate the key now

class DelayedValidator:
	''' Validator class which allows you to specify the "id" dynamically on validation call '''
	def __init__(self, v, *args): # specify the validation function and its arguments
		self.validatorArgs = args
		self.validatorFunction = v

	def execute(self, id, value):
		if not isinstance(id, basestring):
			raise Exception('No valid ID specified for the Validator object')
		func = self.validatorFunction(id, *(self.validatorArgs)) # get the validator function
		func(value) # do the validation

class ClubDB(db.Model):
	# key = url
	def __init__(self, *args, **kwargs):
		ModelKeyValidator(DelayedValidator(re_validator, '^[a-z0-9-]{2,32}$'), self, *args, **kwargs)
		super(self.__class__, self).__init__(*args, **kwargs)

	name = db.StringProperty(
		required = True,
		validator = length_validator('ClubDB.name', 1, None))

You probably noticed that in the second example I also added a validator for the “name” property too. Note that the re_validator() and length_validator() functions can be re-used. Furthermore, thanks to the DelayedValidator class which accepts a validator function and its arguments as constructor arguments, the ModelKeyValidator class can be re-used without any modifications too.

P.S. It seems that all “validator” functions are executed every time a Model class is being instantiated. This means that no matter if you are updating/creating the data object, or you are simply reading it from the datastore, the assigned values are always validated. This surely wastes some CPU cycles, but for now I have no idea how to easily circumvent this.

Disclaimer: I’m new to Python and Google App Engine. But they seem fun! πŸ™‚ Sorry for the long lines…


Resources:


6 Comments

C++ vs. Python vs. Perl vs. PHP performance benchmark (part #2)

This time we will focus on the startup time. The process start time is important if your processes are not persistent. If you are using FastCGI, mod_perl, mod_php, or mod_python, then these statistics are not so important to you. However, if you are spawning many processes which do something small and live for a very short time, then you should consider the CPU resources which get wasted while the script interpreter is being initialized.

The benchmarked scripts do only one thing – say “Hello, world” on the standard output. They do not include any additional modules in their source code – this may, or may not be your use-case. Though, very often the scripting languages have pretty many built-in functions, and for simple tasks you never need to include other modules.

Here are the benchmark results:

Language CPU time Slower than
User System Total C++ previous
C++ (with or w/o optimization) 2.568 3.536 6.051
Perl 12.561 6.096 18.723 209% 209%
PHP (w/o php.ini) 20.473 13.877 34.918 477% 86%
Python 27.014 11.881 39.318 550% 13%
Python + Psyco 32.986 14.845 48.132 695% 22%

The clear winner among the script languages this time is… Perl. πŸ™‚

All scripts were invoked 3000 times using the following Bash loop:

time ( i=3000 ; while [ “$i” -gt 0 ]; do $CMD >/dev/null ; i=$(($i-1)); done )

All tests were done on a Kubuntu Lucid box. The versions of the used software packages follow:

  • g++ (GNU project C and C++ compiler) 4.4.3
  • Python 2.6.5
  • Python Psyco 1.6 (1ubuntu2)
  • Perl 5.10.1
  • PHP 5.3.2 (1ubuntu4.2 with Suhosin-Patch), Zend Engine 2.3.0

The C++ implementation follows, click “show source” below to see the full source:

#include <iostream>
using namespace std;

int main() {
	cout << "Hello, world!\n";
	return 0;
}

The Perl implementation follows, click “show source” below to see the full source:

use strict;
use warnings;

print "Hello, world!\n";

The PHP implementation follows, click “show source” below to see the full source:

<?php
echo "Hello, world!\n";

The Python implementation follows, click “show source” below to see the full source:

#import psyco
#psyco.full()

print 'Hello, world!'

Update (Jan/14/2012): Copied the used test environment info here.


67 Comments

C++ vs. Python vs. Perl vs. PHP performance benchmark

Update: There areΒ newer benchmark results.


This all began as a colleague of mine stated that Python was so damn slow for maths. Which really astonished me and made me check it out, as my father told me once that he was very satisfied with Python, as it was very maths oriented.

The benchmarks here do not try to be complete, as they are showing the performance of the languages in one aspect, and mainly: loops, dynamic arrays with numbers, basic math operations.

Out of curiosity, Python was also benchmarked with and without the Psyco Python extension (now obsoleted by PyPy), which people say could greatly speed up the execution of any Python code without any modifications.

Here are the benchmark results:

Language CPU time Slower than Language
version
Source
code
User System Total C++ previous
C++ (optimized with -O2) 1,520 0,188 1,708 g++ 4.5.2 link
Java (non-std lib) 2,446 0,150 2,596 52% 52% 1.6.0_26 link
C++ (not optimized) 3,208 0,184 3,392 99% 31% g++ 4.5.2 link
Javascript (SpiderMonkey) see comment (SpiderMonkey seems as fast as C++ on Windows)
Javascript (nodejs) 4,068 0,544 4,612 170% 36% 0.8.8 link
Java 8,521 0,192 8,713 410% 150% 1.6.0_26 link
Python + Psyco 13,305 0,152 13,457 688% 54% 2.6.6 link
Ruby see comment (Ruby seems 35% faster than standard Python)
Python 27,886 0,168 28,054 1543% 108% 2.7.1 link
Perl 41,671 0,100 41,771 2346% 49% 5.10.1 link
PHP 5.4 roga’s blog results (PHP 5.4 seems 33% faster than PHP 5.3)
PHP 5.3 94,622 0,364 94,986 5461% 127% 5.3.5 link

The clear winner among the script languages is… Python. πŸ™‚

NodeJS JavaScript is pretty fast too, but internally it works more like a compiled language. See the comments below.

Please read the discussion about Java which I had with Isaac Gouy. He accused me that I am not comparing what I say am comparing. And also that I did not want to show how slow and how fast the Java example program can be. You deserve the whole story, so please read it if you are interested in Java.

Both PHP and Python are taking advantage of their built-in range() function, because they have one. This speeds up PHP by 5%, and Python by 20%.

The times include the interpretation/parsing phase for each language, but it’s so small that its significance is negligible. The math function is called 10 times, in order to have more reliable results. All scripts are using the very same algorithm to calculate the prime numbers in a given range. The correctness of the implementation is not so important, as we just want to check how fast the languages perform. The original Python algorithm was taken from http://www.daniweb.com/code/snippet216871.html.

The tests were run on an Ubuntu Linux machine.

You can download the source codes, an Excel results sheet, and the benchmark batch script at:
http://www.famzah.net/download/langs-performance/


Update (Jul/24/2010): Added the C++ optimized values.
Update (Aug/02/2010): Added a link to the benchmarks, part #2.
Update (Mar/31/2011): Using range() in PHP improves performance with 5%.
Update (Jan/14/2012): Re-organized the results summary table and the page. Added Java.
Update (Apr/02/2012): Added a link to PHP 5.4 vs. PHP 5.3 benchmarks.
Update (May/29/2012): Added the results for Java using a non-standard library.
Update (Jun/25/2012): Made the discussion about Java public, as well as added a note that range() is used for PHP and Python.
Update (Aug/31/2012): Updated benchmarks for the latest node.js.
Update (Oct/24/2012): Added the results for SpiderMonkey JavaScript.
Update (Jan/11/2013): Added the results for Ruby vs. Python and Nodejs.