Google App Engine Performance Profiling

September 14, 2010

When developing under Google App Engine, developers need to pay attention to how fast their code works, and also how much resources their code uses. The first parameter is vital for user-experience, the second – for the hosting expenses, as resources usage costs money.

It turns out that there are quite a few suitable profilers for Google App Engine. At least these are the ones I could find:

  • Appstats – no doubt, the most advanced one, giving you information about the timing and costs of your RPC calls to the datastore, Memcache, etc.
    Works for both Python, and Java. Included in the official SDK as of version 1.3.1.
  • appengine-profiler – besides being an RPC profiler like Appstats, appengine-profiler gives you the option to profile the CPU usage of your code, thus you can easily identify hot-spots where your code wastes a lot of CPU resources and wall-clock time. You can define “tracepoints” which surround part of your code blocks, and you’ll easily know the resources usage of these blocks for each page load.
    Works for Python.
  • AppWrench – the Java profiler. I don’t code on Java, and I haven’t tested this profiler, but I’m including it here for all you Java gurus.

Resources:


Validator for the Model key_name property in Google App Engine datastore (Python)

August 4, 2010

The Google App Engine datastore provides convenient data modeling with Python. One important aspect is the validation of the data stored in a Model instance. Each data key-value is stored as a Property which is an attribute of a Model class.

While every Property can be validated automatically by specifying a “validator” function, there is no option for the Model key name to be automatically validated. Note that we can manually specify by our code the value of the key name, and therefore this key name can be considered user-data and must be validated. The key name is by the way the only unique index constraint, similar to the “primary key” in relational databases, which is supported by the Google datastore, and can be specified manually.

Here is my version for a validation function for the Model’s key name:

from google.appengine.ext import db
import re

def ModelKeyNameValidator(self, regexp_string, *args, **kwargs):
	gotKey = None
	className = self.__class__.__name__

	if len(args) >= 2:
		if gotKey: raise Exception('Found key for second time for Model ' + className)
		gotKey = 'args'
		k = args[1] # key_name given as an unnamed argument
	if 'key' in kwargs:
		if gotKey: raise Exception('Found key for second time for Model ' + className)
		gotKey = 'Key'
		k = kwargs['key'].name() # key_name given as Key instance
	if 'key_name' in kwargs:
		if gotKey: raise Exception('Found key for second time for Model ' + className)
		gotKey = 'key_name'
		k = kwargs['key_name'] # key_name given as a keyword argument

	if not gotKey:
		raise Exception('No key found for Model ' + className)

	id = '%s.key_name(%s)' % (self.__class__.__name__, gotKey)
	if (not re.search(regexp_string, k)):
		raise ValueError('(%s) Value "%s" is invalid. It must match the regexp "%s"' % (id, k, regexp_string))

class ClubDB(db.Model):
	# key = url
	def __init__(self, *args, **kwargs):
		ModelKeyNameValidator(self, '^[a-z0-9-]{2,32}$', *args, **kwargs)
		super(self.__class__, self).__init__(*args, **kwargs)

	name = db.StringProperty(required = True)

As you can see, the proposed solution is not versatile enough, and requires you to copy and alter the ModelKeyNameValidator() function again and again for every new validation type. I strictly follow the Don’t Repeat Yourself principle in programming, so after much Googling and struggling with Python, I got to the following solution which I actually use in my projects (click “show source” to see the code):

from google.appengine.ext import db
import re

def re_validator(id, regexp_string):
	def validator(v):
		string_type_validator(v)
		if (not re.search(regexp_string, v)):
			raise ValueError('(%s) Value "%s" is invalid. It must match the regexp "%s"' % (id, v, regexp_string))
	return validator

def length_validator(id, minlen, maxlen):
	def validator(v):
		string_type_validator(v)
		if minlen is not None and len(v) < minlen:
			raise ValueError('(%s) Value "%s" is invalid. It must be more than %s characters' % (id, v, minlen))
		if maxlen is not None and len(v) > maxlen:
			raise ValueError('(%s) Value "%s" is invalid. It must be less than %s characters' % (id, v, maxlen))
	return validator

def ModelKeyValidator(v, self, *args, **kwargs):
	gotKey = None

	if len(args) >= 2:
		if gotKey: raise Exception('Found key for second time for Model ' + self.__class__.__name__)
		gotKey = 'args'
		k = args[1] # key_name given as unnamed argument
	if 'key' in kwargs:
		if gotKey: raise Exception('Found key for second time for Model ' + self.__class__.__name__)
		gotKey = 'Key'
		k = kwargs['key'].name()
	if 'key_name' in kwargs:
		if gotKey: raise Exception('Found key for second time for Model ' + self.__class__.__name__)
		gotKey = 'key_name'
		k = kwargs['key_name']

	if not gotKey:
		raise Exception('No key found for Model ' + self.__class__.__name__)

	v.execute('%s.key_name(%s)' % (self.__class__.__name__, gotKey), k) # validate the key now

class DelayedValidator:
	''' Validator class which allows you to specify the "id" dynamically on validation call '''
	def __init__(self, v, *args): # specify the validation function and its arguments
		self.validatorArgs = args
		self.validatorFunction = v

	def execute(self, id, value):
		if not isinstance(id, basestring):
			raise Exception('No valid ID specified for the Validator object')
		func = self.validatorFunction(id, *(self.validatorArgs)) # get the validator function
		func(value) # do the validation

class ClubDB(db.Model):
	# key = url
	def __init__(self, *args, **kwargs):
		ModelKeyValidator(DelayedValidator(re_validator, '^[a-z0-9-]{2,32}$'), self, *args, **kwargs)
		super(self.__class__, self).__init__(*args, **kwargs)

	name = db.StringProperty(
		required = True,
		validator = length_validator('ClubDB.name', 1, None))

You probably noticed that in the second example I also added a validator for the “name” property too. Note that the re_validator() and length_validator() functions can be re-used. Furthermore, thanks to the DelayedValidator class which accepts a validator function and its arguments as constructor arguments, the ModelKeyValidator class can be re-used without any modifications too.

P.S. It seems that all “validator” functions are executed every time a Model class is being instantiated. This means that no matter if you are updating/creating the data object, or you are simply reading it from the datastore, the assigned values are always validated. This surely wastes some CPU cycles, but for now I have no idea how to easily circumvent this.

Disclaimer: I’m new to Python and Google App Engine. But they seem fun! :) Sorry for the long lines…


Resources:


Follow

Get every new post delivered to your Inbox.