Getting “500 Line too long (limit is 4096)” error in Perl

September 11, 2010

The error may also be “500 Line too long (limit is 8192)” but the problem is still the same – LWP or SOAP::Lite return this error when you try to POST or GET something very long.

The one to blame is actually Net::HTTP::Methods, included somewhere by something.

It took me a few hours to get this resolved:

use LWP::Protocol::http; # to suppress the warning "possible typo" in the next statement
push(@LWP::Protocol::http::EXTRA_SOCK_OPTS, MaxLineLength => 0); # to remove the limit

Put the above code in your Perl HTTP client and you’re good to go!

References:


C++ vs. Python vs. Perl vs. PHP performance benchmark (part #2)

August 2, 2010

This time we will focus on the startup time. The process start time is important if your processes are not persistent. If you are using FastCGI, mod_perl, mod_php, or mod_python, then these statistics are not so important to you. However, if you are spawning many processes which do something small and live for a very short time, then you should consider the CPU resources which get wasted while the script interpreter is being initialized.

The benchmarked scripts do only one thing – say “Hello, world” on the standard output. They do not include any additional modules in their source code – this may, or may not be your use-case. Though, very often the scripting languages have pretty many built-in functions, and for simple tasks you never need to include other modules.

Here are the benchmark results:

Language CPU time Slower than
User System Total C++ previous
C++ (with or w/o optimization) 2.568 3.536 6.051 - -
Perl 12.561 6.096 18.723 209% 209%
PHP (w/o php.ini) 20.473 13.877 34.918 477% 86%
Python 27.014 11.881 39.318 550% 13%
Python + Psyco 32.986 14.845 48.132 695% 22%

The clear winner among the script languages this time is… Perl. :)

All scripts were invoked 3000 times using the following Bash loop:

time ( i=3000 ; while [ "$i" -gt 0 ]; do $CMD >/dev/null ; i=$(($i-1)); done )

All tests were done on a Kubuntu Lucid box. The versions of the used software packages follow:

  • g++ (GNU project C and C++ compiler) 4.4.3
  • Python 2.6.5
  • Python Psyco 1.6 (1ubuntu2)
  • Perl 5.10.1
  • PHP 5.3.2 (1ubuntu4.2 with Suhosin-Patch), Zend Engine 2.3.0

The C++ implementation follows, click “show source” below to see the full source:

#include <iostream>
using namespace std;

int main() {
	cout << "Hello, world!\n";
	return 0;
}

The Perl implementation follows, click “show source” below to see the full source:

use strict;
use warnings;

print "Hello, world!\n";

The PHP implementation follows, click “show source” below to see the full source:

<?php
echo "Hello, world!\n";

The Python implementation follows, click “show source” below to see the full source:

#import psyco
#psyco.full()

print 'Hello, world!'


Update (Jan/14/2012): Copied the used test environment info here.


Speed up RRDtool database manipulations via RRDs (Perl)

August 1, 2010

Use case
You are doing a lot of data operations on your RRD files (create, update, fetch, last), and every update is done by a separate Perl process which lives a very short time – the process is launched, it updates or reads the data, does something else, and then exits.

The problem
If you are using RRDtool and Perl as described, you surely have noticed that running many of these processes wastes a lot of CPU resources. The question is – can we do some performance optimizations, and lessen the performance hit of loading the RRDs library into Perl? We know that launching often Perl itself is quite expensive, but after all, if we chose to work with Perl, this is a price we should be ready to pay.

The RRDtool shared library is a monolithic piece of code which provides ALL functions of the RRDtool suite – data manipulation, graphics and import/export tools. The last two components bring huge dependencies in regards to other shared libraries. The library from RRDtool version 1.4.4 depends on 34 other libraries on my Linux box! This must add up to the loading time of the RRDtool library into Perl.

Resolution and benchmarks
In order to prove my theory (actually, it was more a theory of zImage, and I just followed, enhanced and tried it), I commented out the implementation of the “graphics” and “import/export tools” modules from the source code of RRDtool. Then I re-compiled the library and did some performance benchmarks. I also re-implemented the RRDs.pm module by replacing the DynaLoader module with the XSLoader one. This made no difference in performance whatsoever. The re-compiled RRD library depends on only 4 other libraries – linux-gate.so.1, libm.so.6, libc.so.6, and /lib/ld-linux.so.2. I think this is the most we can cut down. :)

So here are the benchmark results. They show the accumulated time for 1000 invocations of the Perl interpreter with three different configurations:

  • Only Perl (baseline): 5.454s.
  • With RRDs, no graphics or import/export functions: 9.744s (+4.290s) +78%.
  • With standard RRDs: 11.647s (+6.192s) +113%.

As you can see, you can make Perl + RRDs start 35% faster. The speed up for RRDs itself is 44%.


Here are the commands I used for the benchmarks:

  • Only Perl (baseline): time ( i=1000 ; while [ "$i" -gt 0 ]; do perl -Mwarnings -Mstrict -e ” ; i=$(($i-1)); done )
  • Perl + RRDs: time ( i=1000 ; while [ "$i" -gt 0 ]; do perl -Mwarnings -Mstrict -MRRDs -e ” ; i=$(($i-1)); done )

C++ vs. Python vs. Perl vs. PHP performance benchmark

July 1, 2010

Update: There is a part #2 of the benchmark results.


This all began as a colleague of mine stated that Python was so damn slow for maths. Which really astonished me and made me check it out, as my father told me once that he was very satisfied with Python, as it was very maths oriented.

The benchmarks here do not try to be complete, as they are showing the performance of the languages in one aspect, and mainly: loops, arrays with numbers, basic math operations.

Note: Give your ideas and use-cases on what to benchmark, and I’ll try to implement it for you. I.e. “benchmark the languages for reading a file, then splitting it to tokens by white-space and finally outputting all unique elements and their count”.

Out of curiosity, Python was also benchmarked with and without the Psyco Python extension, which people say could greatly speed up the execution of any Python code without any modifications.

Here are the benchmark results:

Language CPU time Slower than Language
version
Source
code
User System Total C++ previous
C++ (optimized with -O2) 1,520 0,188 1,708 - - g++ 4.5.2 link
C++ (not optimized) 3,208 0,184 3,392 99% 99% g++ 4.5.2 link
Javascript (nodejs) 3,096 0,384 3,480 104% 3% 0.2.6 link
Java 8,521 0,192 8,713 410% 150% 1.6.0_26 link
Python + Psyco 13,305 0,152 13,457 688% 54% 2.6.6 link
Python 27,886 0,168 28,054 1543% 108% 2.7.1 link
Perl 41,671 0,100 41,771 2346% 49% 5.10.1 link
PHP 94,622 0,364 94,986 5461% 127% 5.3.5 link

The clear winner among the script languages is… Python. :)

NodeJS JavaScript is pretty fast too, but internally it works more like a compiled language. See the comments below.

The times include the interpretation/parsing phase for each language, but it’s so small that its significance is negligible. The math function is called 10 times, in order to have more reliable results. All scripts are using the very same algorithm to calculate the prime numbers in a given range. The correctness of the implementation is not so important, as we just want to check how fast the languages perform. The original Python algorithm was taken from http://www.daniweb.com/code/snippet216871.html.

The tests were run on an Ubuntu Linux machine.

You can download the source codes, an Excel results sheet, and the benchmark batch script at:
http://www.famzah.net/download/langs-performance/


Update (Jul/24/2010): Added the C++ optimized values.
Update (Aug/02/2010): Added a link to the benchmarks, part #2.
Update (Mar/31/2011): Using range() in PHP improves performance with 5%.
Update (Jan/14/2012): Re-organized the results summary table and the page. Added Java.


Migrate your TWiki to Google Sites (using Google Sites API and Perl)

May 30, 2010

If you want to transfer your existing TWiki webs to Google Sites, you can do it automatically with the power of the Google Sites API and Perl.

You can download the Perl script, which I used to export my TWiki webs and then import them in Google Sites, at the following page: http://www.famzah.net/download/google-api/twiki2googlesites.pl

Note that is in no way a complete migration solution. You can use it as a demonstration/base on how to interact with the following Google API features using Perl:

Now you know that you can use Perl to interact with Google APIs. Go build your own scripts!

Update: Google released Python command line tools for the Google Data APIs (GoogleCL). They seem promising and very easy to use for simple automation tasks.

P.S. If you’re limited on time and your TWiki is relatively small on pages count, you’ve got a pretty good chance of migrating it manually with copy/paste, than writing your own migration scripts. Believe me. :)


Perl API Kit for ResellerClub (DirectI)

May 26, 2010

ResellerClub offer a SOAP/WSDL API interface, in addition to their Online Control Panel, which lets you automate some of your tasks or integrate it directly with your website.

They claim to support a Perl API Kit, but it doesn’t work out-of-the box for me. Whenever I make an API call, I get the following:

soapenv:Server.userException java.lang.Exception: Body not found.

There is a similar bug report at Web Hosting Talk too.

After a few hours of struggling with SOAP::Lite, reading sources, and some trial and error, I finally was able to make the API work in Perl! :D

If you want to try my version of their Perl API Kit, you have to execute the following:

wget --no-verbose http://www.famzah.net/download/resellerclub/resellerclub-api.tgz
tar -zxf resellerclub-api.tgz
cd resellerclub-api

vi example.pl # edit your username/password
./example.pl

In order to build my version of the Perl API Kit yourself, click the “show source” link below and execute the commands.

mkdir resellerclub-api
cd resellerclub-api
wget --no-verbose http://www.famzah.net/download/resellerclub/setup.sh
wget --no-verbose http://www.famzah.net/download/resellerclub/example.pl
chmod +x setup.sh example.pl
./setup.sh

vi example.pl # edit your username/password
./example.pl

The scripts use some Debian/Ubuntu specific “apt-get” commands to install the required Perl and system packages, but this can easily be ported to other *nix systems too.


Auto-flush both STDOUT and STDERR in Perl

April 8, 2010

Q: Why Perl warn() or other STDERR output is not shown/logged/saved/flushed into my log file?
A: You may have encountered the well-known feature of stream buffering which is enabled by default.

An excerpt from the perlvar documentation says that “…STDOUT will typically be line buffered if output is to the terminal and block buffered otherwise”. Thus it is always buffered, also for STDERR.

Usually people remember to set STDOUT as auto-flush, but you should enable this for STDERR as well, or else your messages to STDERR may not appear in your log file immediately, if you are redirecting STDERR to a file.

The following piece of code sets an auto-flush for both STDOUT and STDERR:

select(STDERR);
$| = 1;
select(STDOUT); # default
$| = 1;

The select() function and the $| variable are built-in for Perl and require no additional libraries to be included.

Alternatively, you can also use IO::Handle to achieve the same result:

use IO::Handle;
STDERR->autoflush(1);
STDOUT->autoflush(1);

I never realized why stream buffering for STDOUR and STDERR is enabled by default for most scripting languages… But that’s just me.


References:


Follow

Get every new post delivered to your Inbox.