I recently needed to escape some user-supplied input for an URL address variable, in a Bash script. This is what the PHP urlencode(), and Perl URI::Escape::uri_escape() functions do, for example. My initial approach was to call Perl from the Bash script:
#!/bin/bash function urlencode() { echo -n "$1" | perl -MURI::Escape -ne 'print uri_escape($_)' }
Though I wanted to optimize the Bash script by not having to fork() a Perl interpreter every time, which could be CPU intensive if you execute the Bash script often. So I ended up with the following solution, entirely coded in Bash, using Bash string manipulation and Bash hash arrays:
#!/bin/bash set -u declare -A ord_hash # associative hash; requires Bash version 4 function init_urlencode() { # this is the whole ASCII set, without the chr(0) and chr(255) characters ASCII='...!"#$%&'\''()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ЂЃ‚ѓ„…†‡€‰Љ‹ЊЌЋЏђ‘’“”•–—™љ›њќћџ ЎўЈ¤Ґ¦§Ё©Є«¬®Ї°±Ііґµ¶·ё№є»јЅѕїАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэю...' # download the script, don't copy paste it from the blog page! # chr(0) cannot be stored in a Bash variable local idx for idx in {0..253}; do # 0..253 = 254 elements = length($ASCII) local c="${ASCII:$idx:1}" # VERY SLOW local store_idx=$(($idx+1)) ord_hash["$c"]="$store_idx" # chr(255) cannot be used as a key done } function urlencode() { local inp="$1" local len="${#inp}" local n=0 local val while [ "$n" -lt "$len" ]; do local c="${inp:$n:1}" # VERY SLOW if [ "$c" == "я" ]; then # chr(255) cannot be used as a key val=255 else val="${ord_hash[$c]}" fi printf '%%%02X' "$val" n=$((n+1)) done } init_urlencode # call only once urlencode 'some^fancy#text'
The logic works pretty well, but the performance is terrible. It turned out that the Bash string manipulation methods are rather slow. So I finally ended up by using Perl, the same way I did it initially. For very small strings in the order of a few characters, you should be fine. But for anything else, this implementation is not recommended.
If you still want to use the Bash code, please download it directly from here, because the blog page messed up some of the special ASCII characters.