/contrib/famzah

Enthusiasm never stops

Bash: Split a string into columns by white-space without invoking a subshell

3 Comments

The classical approach is:

RESULT="$(echo "$LINE"| awk '{print $1}')" # executes in a subshell 

Processing thousands of lines this way however fork()’s thousands of processes, which affects performance and makes your script CPU hungry.

Here is the effective solution which I found with my colleagues at work:

COLS=( $LINE ); # parses columns without executing a subshell
RESULT="${COLS[0]}"; # returns first column (0-based indexes)

Here is an example:

LINE="col0 col1  col2     col3  col4      " # white-space including tab chars
COLS=( $LINE ); # parses columns without executing a subshell

echo "${COLS[0]}"; # prints "col0"
echo "${COLS[1]}"; # prints "col1"
echo "${COLS[2]}"; # prints "col2"
echo "${COLS[3]}"; # prints "col3"
echo "${COLS[4]}"; # prints "col4"

If you want to split not by white-space but by any other character, you can temporarily change the IFS variable which determines how Bash recognizes fields and word boundaries.

Update: Aug/2016: I’ve encountered a bug in Bash where this splitting doesn’t work as expected! Please see the comments below.

Author: Ivan Zahariev

An experienced Linux & IT enthusiast, Engineer by heart, Systems architect & developer, Electronics hobbyist, Rock music & Karaoke fan, Novice guitar player & Go-kart pilot.

3 thoughts on “Bash: Split a string into columns by white-space without invoking a subshell

  1. Nice🙂 You should add a little bit of context to make it more obvious (wrap it in a loop).

  2. Unfortunately, a bug was introduced between Bash versions 4.3.11 and 4.3.30. Here is a sample Bash script:

    #!/bin/bash
    set -u
    
    A='test'
    V='`ls` [a1b] [1] [0] [2] [abc] {1} {0} {0..2} $A $(( 1 + 2 )) [[1]] [[ 1 ]]'
    cols=( $V )
    
    for i in $(seq 0 "$(( ${#cols[@]} - 1 ))") ; do
            echo "$i: ${cols[$i]}"
    done 
    

    Here is the correct result when executed under Bash version 4.3.11:

    0: `ls`
    1: [a1b]
    2: [1]
    3: [0]
    4: [2]
    5: [abc]
    6: {1}
    7: {0}
    8: {0..2}
    9: $A
    10: $((
    11: 1
    12: +
    13: 2
    14: ))
    15: [[1]]
    16: [[
    17: 1
    18: ]] 
    

    Here is the incorrect result when executed under Bash version 4.3.30:

    0: `ls`
    1: 1
    2: 1
    3: [0]
    4: [2]
    5: [abc]
    6: {1}
    7: {0}
    8: {0..2}
    9: $A
    10: $((
    11: 1
    12: +
    13: 2
    14: ))
    15: [[1]]
    16: [[
    17: 1
    18: ]] 
    

    You can spot the difference for indexes “1” and “2”.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.