Ivan Zahariev

September 10, 2010
by Ivan Zahariev 4 Comments

Associate Amazon EC2 Elastic IP in a different region

If you allocate an Elastic IP address in the US-East region (N.Virginia), then sorry folks but you can’t use this IP address with an EC2 instance which is located in another region, say EU-West (Ireland) for example.
No problem to remap it to an EC2 instance in another availability zone within the same region.

The documentation of Amazon EC2 leaves another impression:

Elastic IP addresses allow you to mask instance or Availability Zone failures by programmatically remapping your public IP addresses to any instance in your account.

This is a bit misleading. I already had started dreaming as how I can transfer my EC2 instance across continents with no DNS propagation issues. And also had started admiring Amazon as to how they achieved this technically… Well, they haven’t. 🙂

References:

My inquiry about this in the AWS->EC2 forum.

August 31, 2010
by Ivan Zahariev Leave a comment

Google App Engine – Datastore performance, and Memcache behavior

Ever since I’ve been working with Google App Engine, there are two issues which bothered me a lot:

Datastore performance – lots of people have already written about it (see links #1, #2, and #3). Currently, when working with small datasets, it’s far from being comparable even with a slow MySQL database, and you may also occasionally get internal errors, as well as increased latencies. I contacted Google about this, and asked them if the Business customers of GAE who pay for it would get better Datastore performance. Here is what I got as an answer from Nick Johnson, a GAE developer:

Business customers will receive paid support, which is prioritized, as well as the extra features we announced at I/O. System latency is not any different, however, as we try and make the system as fast as possible for all our users.

So the bad news is that you cannot make the Datastore run faster, even if you pay.
The good news is that we are all getting the same service in terms of speed, which is a good thing – when everybody is having difficulties, then the community will eventually find a solution.
Memcache fairness – what happens if another website (on the same server) uses the Memcache service extensively, thus making the Memcache entries of my website expire too quickly, due to the memory pressure. Here is what Nick Johnson from Google replied:

Memcache is segmented by application. Although there is some variation (so that apps that don’t use any memcache don’t take up usable space), every app is guaranteed a fair share of memcache space.

Excellent system design GAE engineers. Keep up the good work!

Update: Google App Engine engineers continue to do a very good work indeed! You should take a look at the new features announced with the 1.3.6 release of GAE.

August 11, 2010
by Ivan Zahariev 2 Comments

USB: rejected 1 configuration due to insufficient available bus power

If your USB device is not being recognized, execute the command “dmesg” and check if the following output is there:

usb 1-1.4: rejected 1 configuration due to insufficient available bus power

The “1-1.4” ID may be different for your configuration.

If, and only if, you are absolutely sure that your USB hub and/or hardware configuration have a safe way to actually supply enough power, you can override this barrier and force the device to be activated despite of the error message. A possible situation is where you manually applied 5V external power on your USB device and/or USB hub, like I did on my Bifferboard.

Here is how you can override the power safety mechanism:

echo 1 > /sys/bus/usb/devices/1-1.4/bConfigurationValue

Replace “1-1.4” with your USB device ID. Be careful and have fun!

Resources:

Error insufficient available bus power RT2573.

August 8, 2010
by Ivan Zahariev Leave a comment

Secure NAS on Bifferboard running Debian

This NAS solution uses OpenSSH for secure transport over a TCP connection, and NFS to mount the volume on your local computer. The hardware of the NAS server is the low-cost Bifferboard.

I’m using an external hard disk via USB which is partitioned in two parts – /dev/sda1 (1GB) and the rest in /dev/sda2. Once you have installed Debian on Bifferboard, here are the commands which further transform your Bifferboard into a secure NAS:

apt-get update
apt-get -y install nfs-kernel-server

vi /etc/default/nfs-common 
  # update: STATDOPTS='--port 2231'
vi /etc/default/nfs-kernel-server 
  # update: RPCMOUNTDOPTS='-p 2233'

mkdir -m 700 /root/.ssh
  # add your public key for "root" in /root/.ssh/authorized_keys

echo '/mnt/storage 127.0.0.1(rw,no_root_squash,no_subtree_check,insecure,async)' >> /etc/exports
mkdir /mnt/storage
chattr +i /mnt/storage # so that we don't accidentally write there without a mounted volume

cat > /etc/rc.local <<EOF
#!/bin/bash

# allow only SSH access via the network
/sbin/iptables -P FORWARD DROP
/sbin/iptables -P INPUT DROP
/sbin/iptables -A INPUT -i lo -j ACCEPT
/sbin/iptables -A INPUT -p tcp --dport 22 -j ACCEPT
/sbin/iptables -A INPUT -p tcp -m state --state ESTABLISHED -j ACCEPT # TCP initiated by server
/sbin/iptables -A INPUT -p udp -m state --state ESTABLISHED -j ACCEPT # DNS traffic

# mount the storage volume here, so that any errors with it don't interfere with the system startup
/bin/mount /dev/sda2 /mnt/storage
/etc/init.d/nfs-kernel-server restart
EOF

# allow only public key authentication
fgrep -i -v PasswordAuthentication /etc/ssh/sshd_config > /tmp/sshd_config && \
  mv -f /tmp/sshd_config /etc/ssh/sshd_config && \
  echo 'PasswordAuthentication no' >> /etc/ssh/sshd_config

reboot

There are two things you should consider with this setup:

You must trust the “root” user who mounts the directory! They have full shell access to your NAS.
A not-so-strong SSH encryption cipher is used, in order to improve the performance of the SSH transfer.

On the machine which is being backed up, I use the following script which mounts the NAS volume, starts the rsnapshot backup process and finally unmounts the NAS volume:

#!/bin/bash
set -u

HOST='192.168.100.102'
SSHUSER='root'
REMOTEPORT='22'
REMOTEDIR='/mnt/storage'
LOCALDIR='/mnt/storage'
SSHKEY='/home/famzah/.ssh/id_rsa-home-backups'

echo "Mounting NFS volume on $HOST:$REMOTEPORT (SSH-key='$SSHKEY')."
N=0
for port in 2049 2233 ; do
	N=$(($N + 1))
	LPORT=$((61000 + $N))
	ssh -f -i "$SSHKEY" -c arcfour128 -L 127.0.0.1:"$LPORT":127.0.0.1:"$port" -p "$REMOTEPORT" "$SSHUSER@$HOST" sleep 600d
	echo "Forwarding: $HOST: Local port: $LPORT -> Remote port: $port"
done
sudo mount -t nfs -o noatime,nfsvers=2,proto=tcp,intr,rw,bg,port=61001,mountport=61002 "127.0.0.1:$REMOTEDIR" "$LOCALDIR"

echo "Doing backup."
time sudo /usr/bin/rsnapshot weekly

echo "Unmounting NFS volume and closing SSH tunnels."
sudo umount "$LOCALDIR"
for pid in $(ps axuww|grep ssh|grep 6100|grep arcfour|grep -v grep|awk '{print $2}') ; do
	kill "$pid" # possibly dangerous...
done

Update, 29/Sep/2010 – performance tunes:

Added “async” in “/etc/exports”.
Removed the “rsize=8192,wsize=8192” mount options – they are auto-negotiated by default.
Added the “noatime” mount option.
Put the SSH username in a variable.

Resources:

Tunneling NFS over SSH « HowtoForge.

August 7, 2010
by Ivan Zahariev 5 Comments

Beware of leading zeros in Bash numeric variables

Suppose you have some (user) value in a numeric variable with leading zeros. For example, you number something with zero-padded numbers consisting of 3 digits: 001, 002, 003, and so on. This label is assigned to a Bash variable, named $N.

Until the numbers are below 008, and until you use the variable only in text interpolations, you’re safe. For example, the following works just fine:

N=016
echo "Value: $N"
# result is "016"

However… 🙂
If you start using this variable as a numeric variable in arithmetics, then you’re in trouble. Here is an example:

N=016
echo $((N + 2))
# result is 16, not 18, as expected!
printf %d "$N"
# result is 14, not 16, as expected!

You probably already see the pattern – “016” is not treated as a decimal number, but as an octal one. Because of the leading zero. This is explained in the man page of bash, section “ARITHMETIC EVALUATION” (aka. “Shell Arithmetic”).

In order to force decimal representation and as a side effect also remove any leading zeros for a Bash variable, you need to treat it as follows:

N=016
N=$((10#$N)) # force decimal (base 10)
echo $((N + 2))
# result is 18, ok
printf %d "$N"
# result is 16, ok

Note also that there’s another caveat – forcing the number to decimal base 10 doesn’t actually validate that it contains only [0-9] characters. Read the very last paragraph of the man page of bash, section “ARITHMETIC EVALUATION” (aka. “Shell Arithmetic”), for more details on how digits can be represented by letters and symbols. My tests however show that you can’t operate with invalid numbers in base 10, though I’m no expert here. In order to be on the safe side, I would suggest that you validate your numbers with a strict regular expression, just in case, and if you don’t trust the data input.

Resources:

Remove Leading 0’s in Bash « Mike Zupan’s Random Blog

August 4, 2010
by Ivan Zahariev Leave a comment

Validator for the Model key_name property in Google App Engine datastore (Python)

The Google App Engine datastore provides convenient data modeling with Python. One important aspect is the validation of the data stored in a Model instance. Each data key-value is stored as a Property which is an attribute of a Model class.

While every Property can be validated automatically by specifying a “validator” function, there is no option for the Model key name to be automatically validated. Note that we can manually specify by our code the value of the key name, and therefore this key name can be considered user-data and must be validated. The key name is by the way the only unique index constraint, similar to the “primary key” in relational databases, which is supported by the Google datastore, and can be specified manually.

Here is my version for a validation function for the Model’s key name:

from google.appengine.ext import db
import re

def ModelKeyNameValidator(self, regexp_string, *args, **kwargs):
	gotKey = None
	className = self.__class__.__name__

	if len(args) >= 2:
		if gotKey: raise Exception('Found key for second time for Model ' + className)
		gotKey = 'args'
		k = args[1] # key_name given as an unnamed argument
	if 'key' in kwargs:
		if gotKey: raise Exception('Found key for second time for Model ' + className)
		gotKey = 'Key'
		k = kwargs['key'].name() # key_name given as Key instance
	if 'key_name' in kwargs:
		if gotKey: raise Exception('Found key for second time for Model ' + className)
		gotKey = 'key_name'
		k = kwargs['key_name'] # key_name given as a keyword argument

	if not gotKey:
		raise Exception('No key found for Model ' + className)

	id = '%s.key_name(%s)' % (self.__class__.__name__, gotKey)
	if (not re.search(regexp_string, k)):
		raise ValueError('(%s) Value "%s" is invalid. It must match the regexp "%s"' % (id, k, regexp_string))

class ClubDB(db.Model):
	# key = url
	def __init__(self, *args, **kwargs):
		ModelKeyNameValidator(self, '^[a-z0-9-]{2,32}$', *args, **kwargs)
		super(self.__class__, self).__init__(*args, **kwargs)

	name = db.StringProperty(required = True)

As you can see, the proposed solution is not versatile enough, and requires you to copy and alter the ModelKeyNameValidator() function again and again for every new validation type. I strictly follow the Don’t Repeat Yourself principle in programming, so after much Googling and struggling with Python, I got to the following solution which I actually use in my projects (click “show source” to see the code):

from google.appengine.ext import db
import re

def re_validator(id, regexp_string):
	def validator(v):
		string_type_validator(v)
		if (not re.search(regexp_string, v)):
			raise ValueError('(%s) Value "%s" is invalid. It must match the regexp "%s"' % (id, v, regexp_string))
	return validator

def length_validator(id, minlen, maxlen):
	def validator(v):
		string_type_validator(v)
		if minlen is not None and len(v) < minlen:
			raise ValueError('(%s) Value "%s" is invalid. It must be more than %s characters' % (id, v, minlen))
		if maxlen is not None and len(v) > maxlen:
			raise ValueError('(%s) Value "%s" is invalid. It must be less than %s characters' % (id, v, maxlen))
	return validator

def ModelKeyValidator(v, self, *args, **kwargs):
	gotKey = None

	if len(args) >= 2:
		if gotKey: raise Exception('Found key for second time for Model ' + self.__class__.__name__)
		gotKey = 'args'
		k = args[1] # key_name given as unnamed argument
	if 'key' in kwargs:
		if gotKey: raise Exception('Found key for second time for Model ' + self.__class__.__name__)
		gotKey = 'Key'
		k = kwargs['key'].name()
	if 'key_name' in kwargs:
		if gotKey: raise Exception('Found key for second time for Model ' + self.__class__.__name__)
		gotKey = 'key_name'
		k = kwargs['key_name']

	if not gotKey:
		raise Exception('No key found for Model ' + self.__class__.__name__)

	v.execute('%s.key_name(%s)' % (self.__class__.__name__, gotKey), k) # validate the key now

class DelayedValidator:
	''' Validator class which allows you to specify the "id" dynamically on validation call '''
	def __init__(self, v, *args): # specify the validation function and its arguments
		self.validatorArgs = args
		self.validatorFunction = v

	def execute(self, id, value):
		if not isinstance(id, basestring):
			raise Exception('No valid ID specified for the Validator object')
		func = self.validatorFunction(id, *(self.validatorArgs)) # get the validator function
		func(value) # do the validation

class ClubDB(db.Model):
	# key = url
	def __init__(self, *args, **kwargs):
		ModelKeyValidator(DelayedValidator(re_validator, '^[a-z0-9-]{2,32}$'), self, *args, **kwargs)
		super(self.__class__, self).__init__(*args, **kwargs)

	name = db.StringProperty(
		required = True,
		validator = length_validator('ClubDB.name', 1, None))

You probably noticed that in the second example I also added a validator for the “name” property too. Note that the re_validator() and length_validator() functions can be re-used. Furthermore, thanks to the DelayedValidator class which accepts a validator function and its arguments as constructor arguments, the ModelKeyValidator class can be re-used without any modifications too.

P.S. It seems that all “validator” functions are executed every time a Model class is being instantiated. This means that no matter if you are updating/creating the data object, or you are simply reading it from the datastore, the assigned values are always validated. This surely wastes some CPU cycles, but for now I have no idea how to easily circumvent this.

Disclaimer: I’m new to Python and Google App Engine. But they seem fun! 🙂 Sorry for the long lines…

Resources:

Google App Engine: Model integrity constraints? << Stack Overflow

August 2, 2010
by Ivan Zahariev 6 Comments

C++ vs. Python vs. Perl vs. PHP performance benchmark (part #2)

This time we will focus on the startup time. The process start time is important if your processes are not persistent. If you are using FastCGI, mod_perl, mod_php, or mod_python, then these statistics are not so important to you. However, if you are spawning many processes which do something small and live for a very short time, then you should consider the CPU resources which get wasted while the script interpreter is being initialized.

The benchmarked scripts do only one thing – say “Hello, world” on the standard output. They do not include any additional modules in their source code – this may, or may not be your use-case. Though, very often the scripting languages have pretty many built-in functions, and for simple tasks you never need to include other modules.

Here are the benchmark results:

Language	CPU time			Slower than
Language	User	System	Total	C++	previous
C++ (with or w/o optimization)	2.568	3.536	6.051	–	–
Perl	12.561	6.096	18.723	209%	209%
PHP (w/o php.ini)	20.473	13.877	34.918	477%	86%
Python	27.014	11.881	39.318	550%	13%
Python + Psyco	32.986	14.845	48.132	695%	22%

The clear winner among the script languages this time is… Perl. 🙂

All scripts were invoked 3000 times using the following Bash loop:

time ( i=3000 ; while [ “$i” -gt 0 ]; do $CMD >/dev/null ; i=$(($i-1)); done )

All tests were done on a Kubuntu Lucid box. The versions of the used software packages follow:

g++ (GNU project C and C++ compiler) 4.4.3
Python 2.6.5
Python Psyco 1.6 (1ubuntu2)
Perl 5.10.1
PHP 5.3.2 (1ubuntu4.2 with Suhosin-Patch), Zend Engine 2.3.0

The C++ implementation follows, click “show source” below to see the full source:

#include <iostream>
using namespace std;

int main() {
	cout << "Hello, world!\n";
	return 0;
}

The Perl implementation follows, click “show source” below to see the full source:

use strict;
use warnings;

print "Hello, world!\n";

The PHP implementation follows, click “show source” below to see the full source:

<?php
echo "Hello, world!\n";

The Python implementation follows, click “show source” below to see the full source:

#import psyco
#psyco.full()

print 'Hello, world!'

Update (Jan/14/2012): Copied the used test environment info here.

August 1, 2010
by Ivan Zahariev Leave a comment

Speed up RRDtool database manipulations via RRDs (Perl)

Use case
You are doing a lot of data operations on your RRD files (create, update, fetch, last), and every update is done by a separate Perl process which lives a very short time – the process is launched, it updates or reads the data, does something else, and then exits.

The problem
If you are using RRDtool and Perl as described, you surely have noticed that running many of these processes wastes a lot of CPU resources. The question is – can we do some performance optimizations, and lessen the performance hit of loading the RRDs library into Perl? We know that launching often Perl itself is quite expensive, but after all, if we chose to work with Perl, this is a price we should be ready to pay.

The RRDtool shared library is a monolithic piece of code which provides ALL functions of the RRDtool suite – data manipulation, graphics and import/export tools. The last two components bring huge dependencies in regards to other shared libraries. The library from RRDtool version 1.4.4 depends on 34 other libraries on my Linux box! This must add up to the loading time of the RRDtool library into Perl.

Resolution and benchmarks
In order to prove my theory (actually, it was more a theory of zImage, and I just followed, enhanced and tried it), I commented out the implementation of the “graphics” and “import/export tools” modules from the source code of RRDtool. Then I re-compiled the library and did some performance benchmarks. I also re-implemented the RRDs.pm module by replacing the DynaLoader module with the XSLoader one. This made no difference in performance whatsoever. The re-compiled RRD library depends on only 4 other libraries – linux-gate.so.1, libm.so.6, libc.so.6, and /lib/ld-linux.so.2. I think this is the most we can cut down. 🙂

So here are the benchmark results. They show the accumulated time for 1000 invocations of the Perl interpreter with three different configurations:

Only Perl (baseline): 5.454s.
With RRDs, no graphics or import/export functions: 9.744s (+4.290s) +78%.
With standard RRDs: 11.647s (+6.192s) +113%.

As you can see, you can make Perl + RRDs start 35% faster. The speed up for RRDs itself is 44%.

Here are the commands I used for the benchmarks:

Only Perl (baseline): time ( i=1000 ; while [ “$i” -gt 0 ]; do perl -Mwarnings -Mstrict -e ” ; i=$(($i-1)); done )
Perl + RRDs: time ( i=1000 ; while [ “$i” -gt 0 ]; do perl -Mwarnings -Mstrict -MRRDs -e ” ; i=$(($i-1)); done )

July 9, 2010
by Ivan Zahariev Leave a comment

Free SSL certificates

More and more people start telling me about the StartSSL SSL authority, which is a daughter company of StartCom. The rumor that they are giving free SSL certificates looked too unbelievable to me, so I decided to review this more carefully.

After much reading at their page, what people say was confirmed – StartSSL really issue SSL certificates for free, when they are about to be used by individuals on their websites. This means that your personal name stays in the SSL certificate information which can be reviewed if you click on the SSL bar in your web browser.

Business or other legal entities verify their company’s information once for an annual fee and can then issue an unlimited count of SSL certificates too, including wild-card ones. Once verified, a business customer can purchase EV certificates for US$ 49.90 per year.

You can compare these prices with any other SSL certificate authority and you’ll see it yourself that StartSSL are the most affordable one, and the only one which doesn’t charge you for what doesn’t cost them money either – that’s why they can offer “loosely verified” SSL certificates for personal websites for free. It’s unbelievable but true.

My IT brain immediately started to doubt the technical side. I had to check if web browsers accept these SSL certificates without issuing an SSL warning about the certificate being signed by an unknown SSL authority. The test results were successful and the SSL root authority of StartSSL was recognized by the latest version of:

Internet Explorer 8 on Windows.
Chrome on Windows.
Firefox on Windows and Linux.
Chromium on Linux.

Furthermore, the Debian “lenny”, “squeeze” and Ubuntu Lucid CA repositories also recognize the StartSSL root certificate. You can verify this yourself by the following command:
openssl s_client -CApath /etc/ssl/certs -connect startssl.com:443

StartSSL have a long list with platforms and browsers which recognize their certificates. You can review the list at the products comparison page.

No more self-signed SSL certificates for personal use, hurray! 🙂

Update 29/Nov/2010: If you’re interested, you can also review my success story with the Support staff of StartSSL.

July 5, 2010
by Ivan Zahariev 34 Comments

The Super Micro IPMI Console + Java are killing me

I don’t know if it’s Java or the Super Micro IPMI developers to blame, or both. One thing is for sure – I rarely need it, but almost each time I want to use the server-critical “Console Redirection” feature on our Super Micro servers, there is some problem with the Java applet. Thus I’m not able to access the remote console of the server quickly, which in turn gets me real headache.

Today, it’s the “Launch Console” button doing absolutely nothing on my Kubuntu desktop – no errors, no action after clicking it, no nothing. I (always) have a “backup option” – a Windows 7 virtual machine running on my desktop, as Java tends to work better for me on Windows (cross-platform, eh?). Same problem on the Windows too. As I’m a real paranoid about having a backup, I have a backup of the “backup option” – X over VNC, running on
some not-so-bleeding-edge Linux machines, in order to have a “stable” Java installation there. Though the Java failed on them today as well, as they are running Debian “lenny”, which seems to be having the latest Java version 1.6.20 too.

Well… sorry Java applets + Super Micro IPMI, you really disappoint me!

27/Mar/2012: Resolution: Use the IPMIView application which does not rely on web browsers. Tested with Java Version 6 Update 31 (build 1.6.0_31) on Windows 7. Note that IPMIView does not provide a KVM console for older versions of the Super Micro IPMI devices — the good news is that those devices work well within a web browser. 🙂

The (ugly) fix is to downgrade your Java to 1.6.19 (and disable automatic Java updates):
http://www.webhostingtalk.com/showthread.php?t=953055

Update #1: I downgraded to Java 1.6.19 on my Windows 7 by:

Uninstalling the Java 1.6.20 JRE update.
Installing the Java 1.6.19 JRE update which I downloaded from the “Archive: Java[tm] Technology Products Download” page.
Being able to get this working only with Chrome. Firefox and IE 8 failed to work.

Update #2: Linux doesn’t seem to be having any problems. Firefox 3.6.3 on Ubuntu and Gentoo with Sun Java 1.6.20 works fine.

Update #3: If you upgrade the IPMI firmware to version 2.02, the Windows problem is fixed.

Here is some debug info from the Debian “lenny” Iceweasel browser, the only one which issued an error:

Unable to launch ATEN Java iKVM Viewer.
An error occurred while launching/running the application.

Title: ATEN Java iKVM Viewer
Vendor: ATEN
Category: Download Error

Unable to load resource: (https://%IP%/iKVM.jar, 1.56.3.0×0)

Wrapped Exception: java.io.IOException: HTTP response 404.

At the same time, the Java test page works fine. The version on the Debian “lenny” “sun-java6-jre” package is “6-20-01lenny1” (Java JRE 1.6.20).

The same problem is re-produced on:

Windows 7, running Java 1.6.20, under IE 8, Firefox 3.6.3 and Chrome 5.0.375.99.
Kubuntu Lucid, running OpenJDK 6 build b18, under Firefox 3.6.3.

The Firmware Revision of the IPMI interface on the X8DTL motherboard is 01.29, dated 2010-01-06. It’s not the latest one, but surely not a very old one. After all, you can’t reboot your production servers for every IPMI firmware release…

Anyway, I try not to write articles with negative attitude, but this time I just couldn’t resist.
Java, Java, Java… 🙂

/contrib/famzah

Enthusiasm never stops

Author Archives: Ivan Zahariev

Associate Amazon EC2 Elastic IP in a different region

Google App Engine – Datastore performance, and Memcache behavior

USB: rejected 1 configuration due to insufficient available bus power

Secure NAS on Bifferboard running Debian

Beware of leading zeros in Bash numeric variables

Validator for the Model key_name property in Google App Engine datastore (Python)

C++ vs. Python vs. Perl vs. PHP performance benchmark (part #2)

Speed up RRDtool database manipulations via RRDs (Perl)

Free SSL certificates

The Super Micro IPMI Console + Java are killing me