/contrib/famzah

Enthusiasm never stops

URL escape in Bash

Leave a comment

I recently needed to escape some user-supplied input for an URL address variable, in a Bash script. This is what the PHP urlencode(), and Perl URI::Escape::uri_escape() functions do, for example. My initial approach was to call Perl from the Bash script:

#!/bin/bash
function urlencode() {
	echo -n "$1" | perl -MURI::Escape -ne 'print uri_escape($_)'
}

Though I wanted to optimize the Bash script by not having to fork() a Perl interpreter every time, which could be CPU intensive if you execute the Bash script often. So I ended up with the following solution, entirely coded in Bash, using Bash string manipulation and Bash hash arrays:

#!/bin/bash
set -u

declare -A ord_hash # associative hash; requires Bash version 4

function init_urlencode() {
	# this is the whole ASCII set, without the chr(0) and chr(255) characters
	ASCII='...!"#$%&'\''()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ЂЃ‚ѓ„…†‡€‰Љ‹ЊЌЋЏђ‘’“”•–—˜™љ›њќћџ ЎўЈ¤Ґ¦§Ё©Є«¬­®Ї°±Ііґµ¶·ё№є»јЅѕїАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэю...'
	# download the script, don't copy paste it from the blog page!

	# chr(0) cannot be stored in a Bash variable

	local idx
	for idx in {0..253}; do # 0..253 = 254 elements = length($ASCII)
		local c="${ASCII:$idx:1}" # VERY SLOW
		local store_idx=$(($idx+1))
		ord_hash["$c"]="$store_idx"
		# chr(255) cannot be used as a key
	done
}

function urlencode() {
	local inp="$1"
	local len="${#inp}"
	local n=0
	local val
	while [ "$n" -lt "$len" ]; do
		local c="${inp:$n:1}" # VERY SLOW
		if [ "$c" == "я" ]; then # chr(255) cannot be used as a key
			val=255
		else
			val="${ord_hash[$c]}"
		fi
		printf '%%%02X' "$val"
		n=$((n+1))
	done
}

init_urlencode # call only once
urlencode 'some^fancy#text'

The logic works pretty well, but the performance is terrible. It turned out that the Bash string manipulation methods are rather slow. So I finally ended up by using Perl, the same way I did it initially. For very small strings in the order of a few characters, you should be fine. But for anything else, this implementation is not recommended.

If you still want to use the Bash code, please download it directly from here, because the blog page messed up some of the special ASCII characters.

Author: Ivan Zahariev

An experienced Linux & IT enthusiast, Engineer by heart, Systems architect & developer.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s