/contrib/famzah

Enthusiasm never stops

Bash: Split a string into columns by white-space without invoking a subshell

4 Comments

The classical approach is:

RESULT="$(echo "$LINE"| awk '{print $1}')" # executes in a subshell 

Processing thousands of lines this way however fork()’s thousands of processes, which affects performance and makes your script CPU hungry.

Here is a more efficient way to do it:

LINE="col0 col1  col2     col3  col4      "
COLS=()

for val in $LINE ; do
        COLS+=("$val")
done

echo "${COLS[0]}"; # prints "col0"
echo "${COLS[1]}"; # prints "col1"
echo "${COLS[2]}"; # prints "col2"
echo "${COLS[3]}"; # prints "col3"
echo "${COLS[4]}"; # prints "col4"

If you want to split not by white-space but by any other character, you can temporarily change the IFS variable which determines how Bash recognizes fields and word boundaries.

P.S. For the record, here is the old solution:

#
# OLD CODE
# Update: Aug/2016: I've encountered a bug in Bash where this splitting doesn't work as expected! Please see the comments below.
#

# Here is the effective solution which I found with my colleagues at work:

COLS=( $LINE ); # parses columns without executing a subshell
RESULT="${COLS[0]}"; # returns first column (0-based indexes)

# Here is an example:

LINE="col0 col1  col2     col3  col4      " # white-space including tab chars
COLS=( $LINE ); # parses columns without executing a subshell

echo "${COLS[0]}"; # prints "col0"
echo "${COLS[1]}"; # prints "col1"
echo "${COLS[2]}"; # prints "col2"
echo "${COLS[3]}"; # prints "col3"
echo "${COLS[4]}"; # prints "col4"

Author: Ivan Zahariev

An experienced Linux & IT enthusiast, Engineer by heart, Systems architect & developer.

4 thoughts on “Bash: Split a string into columns by white-space without invoking a subshell

  1. Nice 🙂 You should add a little bit of context to make it more obvious (wrap it in a loop).

  2. Unfortunately, a bug was introduced between Bash versions 4.3.11 and 4.3.30. Here is a sample Bash script:

    #!/bin/bash
    set -u
    
    A='test'
    V='`ls` [a1b] [1] [0] [2] [abc] {1} {0} {0..2} $A $(( 1 + 2 )) [[1]] [[ 1 ]]'
    cols=( $V )
    
    for i in $(seq 0 "$(( ${#cols[@]} - 1 ))") ; do
            echo "$i: ${cols[$i]}"
    done 
    

    Here is the correct result when executed under Bash version 4.3.11:

    0: `ls`
    1: [a1b]
    2: [1]
    3: [0]
    4: [2]
    5: [abc]
    6: {1}
    7: {0}
    8: {0..2}
    9: $A
    10: $((
    11: 1
    12: +
    13: 2
    14: ))
    15: [[1]]
    16: [[
    17: 1
    18: ]] 
    

    Here is the incorrect result when executed under Bash version 4.3.30:

    0: `ls`
    1: 1
    2: 1
    3: [0]
    4: [2]
    5: [abc]
    6: {1}
    7: {0}
    8: {0..2}
    9: $A
    10: $((
    11: 1
    12: +
    13: 2
    14: ))
    15: [[1]]
    16: [[
    17: 1
    18: ]] 
    

    You can spot the difference for indexes “1” and “2”.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s