URL escape in Bash

August 25, 2011

I recently needed to escape some user-supplied input for an URL address variable, in a Bash script. This is what the PHP urlencode(), and Perl URI::Escape::uri_escape() functions do, for example. My initial approach was to call Perl from the Bash script:

#!/bin/bash
function urlencode() {
	echo -n "$1" | perl -MURI::Escape -ne 'print uri_escape($_)'
}

Though I wanted to optimize the Bash script by not having to fork() a Perl interpreter every time, which could be CPU intensive if you execute the Bash script often. So I ended up with the following solution, entirely coded in Bash, using Bash string manipulation and Bash hash arrays:

#!/bin/bash
set -u

declare -A ord_hash # associative hash; requires Bash version 4

function init_urlencode() {
	# this is the whole ASCII set, without the chr(0) and chr(255) characters
	ASCII='...!"#$%&'\''()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ЂЃ‚ѓ„…†‡€‰Љ‹ЊЌЋЏђ‘’“”•–—˜™љ›њќћџ ЎўЈ¤Ґ¦§Ё©Є«¬­®Ї°±Ііґµ¶·ё№є»јЅѕїАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэю...'
	# download the script, don't copy paste it from the blog page!

	# chr(0) cannot be stored in a Bash variable

	local idx
	for idx in {0..253}; do # 0..253 = 254 elements = length($ASCII)
		local c="${ASCII:$idx:1}" # VERY SLOW
		local store_idx=$(($idx+1))
		ord_hash["$c"]="$store_idx"
		# chr(255) cannot be used as a key
	done
}

function urlencode() {
	local inp="$1"
	local len="${#inp}"
	local n=0
	local val
	while [ "$n" -lt "$len" ]; do
		local c="${inp:$n:1}" # VERY SLOW
		if [ "$c" == "я" ]; then # chr(255) cannot be used as a key
			val=255
		else
			val="${ord_hash[$c]}"
		fi
		printf '%%%02X' "$val"
		n=$((n+1))
	done
}

init_urlencode # call only once
urlencode 'some^fancy#text'

The logic works pretty well, but the performance is terrible. It turned out that the Bash string manipulation methods are rather slow. So I finally ended up by using Perl, the same way I did it initially. For very small strings in the order of a few characters, you should be fine. But for anything else, this implementation is not recommended.

If you still want to use the Bash code, please download it directly from here, because the blog page messed up some of the special ASCII characters.


Backup Google Sites automatically

August 2, 2011

I just found out how to make my Google Sites backup script almost non-interactive, so I decided to share. My usage pattern of this script is that I run it every month in the Linux console, and then the weekly backup of my hard disk takes care to additionally back up the information.

Why bother backing up Google Sites?
While Google are very reliable and probably they will never fail me here, I want to have an offline backup of my Google Sites pages in case someone steals my Google Account. So I back up. Online and offline, every week.

The backup script uses the wonderful free Java application “Google Sites Liberation“. My script is actually more like a sample Bash usage of this Java tool. You need to download the .jar file and store it in the same directory as the backup script. The source code follows:

#!/bin/bash
set -e
set -u
set -o pipefail

trap 'echo "ERROR: Abnormal exit." >&2' ERR

# config BEGIN

GUSER='username@gmail.com'
WIKI_LIST='wiki1 wiki2 wiki3'
JAR_BIN='google-sites-liberation-1.0.4.jar'
ROOT_BACKUP_DIR='./sites.google.com'

# config END

echo "We are using '$JAR_BIN'. Check for a newer version:"
echo '	http://code.google.com/p/google-sites-liberation/downloads/list'
read

echo "The directory '$ROOT_BACKUP_DIR' will be deleted!!!"
echo 'Press Enter to confirm.'
read

rm -rf "$ROOT_BACKUP_DIR"
mkdir "$ROOT_BACKUP_DIR"

echo -n "Enter the password for '$GUSER': "
read -s -r -e PASS
echo ; echo

for wiki in $WIKI_LIST ; do
	BACKUP_DIR="$ROOT_BACKUP_DIR/$wiki"
	echo "*** Exporting '$wiki' in '$BACKUP_DIR'..."
	echo "Press Enter to continue."
	read

	mkdir "$BACKUP_DIR"
	java -cp "$JAR_BIN" com.google.sites.liberation.export.Main \
		-w "$wiki" \
		-u "$GUSER" \
		-p "$PASS" \
		-f "$BACKUP_DIR"
	echo
done

References:


Beware of leading zeros in Bash numeric variables

August 7, 2010

Suppose you have some (user) value in a numeric variable with leading zeros. For example, you number something with zero-padded numbers consisting of 3 digits: 001, 002, 003, and so on. This label is assigned to a Bash variable, named $N.

Until the numbers are below 008, and until you use the variable only in text interpolations, you’re safe. For example, the following works just fine:

N=016
echo "Value: $N"
# result is "016"

However… :)
If you start using this variable as a numeric variable in arithmetics, then you’re in trouble. Here is an example:

N=016
echo $((N + 2))
# result is 16, not 18, as expected!
printf %d "$N"
# result is 14, not 16, as expected!

You probably already see the pattern – “016″ is not treated as a decimal number, but as an octal one. Because of the leading zero. This is explained in the man page of bash, section “ARITHMETIC EVALUATION” (aka. “Shell Arithmetic”).

In order to force decimal representation and as a side effect also remove any leading zeros for a Bash variable, you need to treat it as follows:

N=016
N=$((10#$N)) # force decimal (base 10)
echo $((N + 2))
# result is 18, ok
printf %d "$N"
# result is 16, ok

Note also that there’s another caveat – forcing the number to decimal base 10 doesn’t actually validate that it contains only [0-9] characters. Read the very last paragraph of the man page of bash, section “ARITHMETIC EVALUATION” (aka. “Shell Arithmetic”), for more details on how digits can be represented by letters and symbols. My tests however show that you can’t operate with invalid numbers in base 10, though I’m no expert here. In order to be on the safe side, I would suggest that you validate your numbers with a strict regular expression, just in case, and if you don’t trust the data input.


Resources:


Follow

Get every new post delivered to your Inbox.