I recently needed to escape some user-supplied input for an URL address variable, in a Bash script. This is what the PHP urlencode(), and Perl URI::Escape::uri_escape() functions do, for example. My initial approach was to call Perl from the Bash script:
#!/bin/bash
function urlencode() {
echo -n "$1" | perl -MURI::Escape -ne 'print uri_escape($_)'
}
Though I wanted to optimize the Bash script by not having to fork() a Perl interpreter every time, which could be CPU intensive if you execute the Bash script often. So I ended up with the following solution, entirely coded in Bash, using Bash string manipulation and Bash hash arrays:
#!/bin/bash
set -u
declare -A ord_hash # associative hash; requires Bash version 4
function init_urlencode() {
# this is the whole ASCII set, without the chr(0) and chr(255) characters
ASCII='...!"#$%&'\''()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ЂЃ‚ѓ„…†‡€‰Љ‹ЊЌЋЏђ‘’“”•–—™љ›њќћџ ЎўЈ¤Ґ¦§Ё©Є«¬®Ї°±Ііґµ¶·ё№є»јЅѕїАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэю...'
# download the script, don't copy paste it from the blog page!
# chr(0) cannot be stored in a Bash variable
local idx
for idx in {0..253}; do # 0..253 = 254 elements = length($ASCII)
local c="${ASCII:$idx:1}" # VERY SLOW
local store_idx=$(($idx+1))
ord_hash["$c"]="$store_idx"
# chr(255) cannot be used as a key
done
}
function urlencode() {
local inp="$1"
local len="${#inp}"
local n=0
local val
while [ "$n" -lt "$len" ]; do
local c="${inp:$n:1}" # VERY SLOW
if [ "$c" == "я" ]; then # chr(255) cannot be used as a key
val=255
else
val="${ord_hash[$c]}"
fi
printf '%%%02X' "$val"
n=$((n+1))
done
}
init_urlencode # call only once
urlencode 'some^fancy#text'
The logic works pretty well, but the performance is terrible. It turned out that the Bash string manipulation methods are rather slow. So I finally ended up by using Perl, the same way I did it initially. For very small strings in the order of a few characters, you should be fine. But for anything else, this implementation is not recommended.
If you still want to use the Bash code, please download it directly from here, because the blog page messed up some of the special ASCII characters.