/contrib/famzah

Enthusiasm never stops


2 Comments

Bash: Process null-terminated results piped from external commands

Usually when working with filenames we need to terminate each result record uniquely using the special null-character. That’s because filenames may contain special symbols, including white-space and even the newline character “\n”.

There is already a great answer how to do this in the StackOverflow topic “Capturing output of find . -print0 into a bash array”. The proposed solution doesn’t invoke any sub-shells, which is great, and also explains all caveats in detail. In order to become really universal, this solution must not rely on the static file-descriptor “3”. Another great answer at SO gives an example on how to dynamically use the next available file-descriptor.

Here is the solution which works without using sub-shells and without depending on a static FD:

a=()
while IFS='' read -r -u"$FD" -d $'\0' file; do
  # note that $IFS is having the default value here
  a+=("$file") # or however you want to process each file
done {FD}< <(find /tmp -type f -print0)
exec {FD}<&- # close the file descriptor

# the result is available outside the loop, too
echo "${a[0]}" # 1st file
echo "${a[1]}" # 2nd file

Terminal icon created by Julian Turner


Leave a comment

Poor man’s AWS CLI “s3 sync” bandwidth limit

Here you go:

PID=13424; while [ 1 ]; do kill -STOP "$PID" ; sleep 0.4 ; kill -CONT "$PID" ; sleep 0.6 ; done

You may need to adjust the two sleep intervals in seconds.

This is a quick hack until the AWS CLI team releases an official option, which is being discussed under AWS CLI issue #1090.


13 Comments

Using flock() in Bash without invoking a subshell

The flock(1) utility on Linux manages flock(2) advisory locks from within shell scripts or the command line. This lets you synchronize your Bash scripts with all your other applications written in Perl, Python, C, etc.

I’ll focus on the third usage form where flock() is used inside a Bash script. Here is what the man page suggests:

#!/bin/bash

(
flock -s 200

# ... commands executed under lock ...

) 200>/var/lock/mylockfile

Unfortunately, this invokes a subshell which has the following drawbacks:

  • You cannot pass values to variables from the subshell in the main shell script.
  • There is a performance penalty.
  • The syntax coloring in “vim” does not work properly. 🙂

This motivated my colleague zImage to come up with a usage form which does not invoke a subshell in Bash:

#!/bin/bash

exec {lock_fd}>/var/lock/mylockfile || exit 1
flock -n "$lock_fd" || { echo "ERROR: flock() failed." >&2; exit 1; }

# ... commands executed under lock ...

flock -u "$lock_fd"

Note that you can skip the “flock -u “$lock_fd” unlock command if it is at the very end of your script. In such a case, your lock file will be unlocked once your process terminates.


4 Comments

Bash: Split a string into columns by white-space without invoking a subshell

The classical approach is:

RESULT="$(echo "$LINE"| awk '{print $1}')" # executes in a subshell 

Processing thousands of lines this way however fork()’s thousands of processes, which affects performance and makes your script CPU hungry.

Here is a more efficient way to do it:

LINE="col0 col1  col2     col3  col4      "
COLS=()

for val in $LINE ; do
        COLS+=("$val")
done

echo "${COLS[0]}"; # prints "col0"
echo "${COLS[1]}"; # prints "col1"
echo "${COLS[2]}"; # prints "col2"
echo "${COLS[3]}"; # prints "col3"
echo "${COLS[4]}"; # prints "col4"

If you want to split not by white-space but by any other character, you can temporarily change the IFS variable which determines how Bash recognizes fields and word boundaries.

P.S. For the record, here is the old solution:

#
# OLD CODE
# Update: Aug/2016: I've encountered a bug in Bash where this splitting doesn't work as expected! Please see the comments below.
#

# Here is the effective solution which I found with my colleagues at work:

COLS=( $LINE ); # parses columns without executing a subshell
RESULT="${COLS[0]}"; # returns first column (0-based indexes)

# Here is an example:

LINE="col0 col1  col2     col3  col4      " # white-space including tab chars
COLS=( $LINE ); # parses columns without executing a subshell

echo "${COLS[0]}"; # prints "col0"
echo "${COLS[1]}"; # prints "col1"
echo "${COLS[2]}"; # prints "col2"
echo "${COLS[3]}"; # prints "col3"
echo "${COLS[4]}"; # prints "col4"


Leave a comment

URL escape in Bash

I recently needed to escape some user-supplied input for an URL address variable, in a Bash script. This is what the PHP urlencode(), and Perl URI::Escape::uri_escape() functions do, for example. My initial approach was to call Perl from the Bash script:

#!/bin/bash
function urlencode() {
	echo -n "$1" | perl -MURI::Escape -ne 'print uri_escape($_)'
}

Though I wanted to optimize the Bash script by not having to fork() a Perl interpreter every time, which could be CPU intensive if you execute the Bash script often. So I ended up with the following solution, entirely coded in Bash, using Bash string manipulation and Bash hash arrays:

#!/bin/bash
set -u

declare -A ord_hash # associative hash; requires Bash version 4

function init_urlencode() {
	# this is the whole ASCII set, without the chr(0) and chr(255) characters
	ASCII='...!"#$%&'\''()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ЂЃ‚ѓ„…†‡€‰Љ‹ЊЌЋЏђ‘’“”•–—˜™љ›њќћџ ЎўЈ¤Ґ¦§Ё©Є«¬­®Ї°±Ііґµ¶·ё№є»јЅѕїАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэю...'
	# download the script, don't copy paste it from the blog page!

	# chr(0) cannot be stored in a Bash variable

	local idx
	for idx in {0..253}; do # 0..253 = 254 elements = length($ASCII)
		local c="${ASCII:$idx:1}" # VERY SLOW
		local store_idx=$(($idx+1))
		ord_hash["$c"]="$store_idx"
		# chr(255) cannot be used as a key
	done
}

function urlencode() {
	local inp="$1"
	local len="${#inp}"
	local n=0
	local val
	while [ "$n" -lt "$len" ]; do
		local c="${inp:$n:1}" # VERY SLOW
		if [ "$c" == "я" ]; then # chr(255) cannot be used as a key
			val=255
		else
			val="${ord_hash[$c]}"
		fi
		printf '%%%02X' "$val"
		n=$((n+1))
	done
}

init_urlencode # call only once
urlencode 'some^fancy#text'

The logic works pretty well, but the performance is terrible. It turned out that the Bash string manipulation methods are rather slow. So I finally ended up by using Perl, the same way I did it initially. For very small strings in the order of a few characters, you should be fine. But for anything else, this implementation is not recommended.

If you still want to use the Bash code, please download it directly from here, because the blog page messed up some of the special ASCII characters.


4 Comments

Backup Google Sites automatically

I just found out how to make my Google Sites backup script almost non-interactive, so I decided to share. My usage pattern of this script is that I run it every month in the Linux console, and then the weekly backup of my hard disk takes care to additionally back up the information.

Why bother backing up Google Sites?
While Google are very reliable and probably they will never fail me here, I want to have an offline backup of my Google Sites pages in case someone steals my Google Account. So I back up. Online and offline, every week.

The backup script uses the wonderful free Java application “Google Sites Liberation“. My script is actually more like a sample Bash usage of this Java tool. You need to download the .jar file and store it in the same directory as the backup script. The source code follows:

#!/bin/bash
set -e
set -u
set -o pipefail

trap 'echo "ERROR: Abnormal exit." >&2' ERR

# config BEGIN

GUSER='username@gmail.com'
WIKI_LIST='wiki1 wiki2 wiki3'
JAR_BIN='google-sites-liberation-1.0.4.jar'
ROOT_BACKUP_DIR='./sites.google.com'

# config END

echo "We are using '$JAR_BIN'. Check for a newer version:"
echo '	http://code.google.com/p/google-sites-liberation/downloads/list'
read

echo "The directory '$ROOT_BACKUP_DIR' will be deleted!!!"
echo 'Press Enter to confirm.'
read

rm -rf "$ROOT_BACKUP_DIR"
mkdir "$ROOT_BACKUP_DIR"

echo -n "Enter the password for '$GUSER': "
read -s -r -e PASS
echo ; echo

for wiki in $WIKI_LIST ; do
	BACKUP_DIR="$ROOT_BACKUP_DIR/$wiki"
	echo "*** Exporting '$wiki' in '$BACKUP_DIR'..."
	echo "Press Enter to continue."
	read

	mkdir "$BACKUP_DIR"
	java -cp "$JAR_BIN" com.google.sites.liberation.export.Main \
		-w "$wiki" \
		-u "$GUSER" \
		-p "$PASS" \
		-f "$BACKUP_DIR"
	echo
done

References:


5 Comments

Beware of leading zeros in Bash numeric variables

Suppose you have some (user) value in a numeric variable with leading zeros. For example, you number something with zero-padded numbers consisting of 3 digits: 001, 002, 003, and so on. This label is assigned to a Bash variable, named $N.

Until the numbers are below 008, and until you use the variable only in text interpolations, you’re safe. For example, the following works just fine:

N=016
echo "Value: $N"
# result is "016"

However… 🙂
If you start using this variable as a numeric variable in arithmetics, then you’re in trouble. Here is an example:

N=016
echo $((N + 2))
# result is 16, not 18, as expected!
printf %d "$N"
# result is 14, not 16, as expected!

You probably already see the pattern – “016” is not treated as a decimal number, but as an octal one. Because of the leading zero. This is explained in the man page of bash, section “ARITHMETIC EVALUATION” (aka. “Shell Arithmetic”).

In order to force decimal representation and as a side effect also remove any leading zeros for a Bash variable, you need to treat it as follows:

N=016
N=$((10#$N)) # force decimal (base 10)
echo $((N + 2))
# result is 18, ok
printf %d "$N"
# result is 16, ok

Note also that there’s another caveat – forcing the number to decimal base 10 doesn’t actually validate that it contains only [0-9] characters. Read the very last paragraph of the man page of bash, section “ARITHMETIC EVALUATION” (aka. “Shell Arithmetic”), for more details on how digits can be represented by letters and symbols. My tests however show that you can’t operate with invalid numbers in base 10, though I’m no expert here. In order to be on the safe side, I would suggest that you validate your numbers with a strict regular expression, just in case, and if you don’t trust the data input.


Resources: