"sh" — The Unix Shell Scripting Language

A summary of practical techniques by Mike McCarthy, 2002


Scripts are text files that contain a set of instructions that can be executed by the shell. While scripts can be written in various languages (such as perl, awk, php, etc), this guide explains most of the common features of the Unix Bourne shell scripting language.


Comments
All comments are single-line duration preceeded by the '#' character. The first line of a shell script must contain the special comment #!/bin/sh which tells the kernel that this is a shell script whose interpreter resides in the bin directory. If this were a perl script, the first line comment would be #!/bin/perl. All other comments simply begin with the '#' followed by plain text.

# this is a comment


No compiling
Unlike C/C++ and other languages, the shell script instruction set is not compiled into object code. It remains a plain text file which when executed, can invoke various Unix software tools. Each tool (or command) is itself a miniature program which generally "does one thing well". To make a shell script executable, the chmod command must be used to set its executable privilege. For example chmod +x myscript, or chmod go+x myscript.


Shell programming language
While a script could be nothing more than a simple list of Unix commands executed one-by-one, the shell programming language offers a rich set of instructions to enable flow control, variable assignments, and conditional tests. The language can only interpret character strings, not binary values such as integers, floats, etc. If a numeric value is needed, the 'expr' command can be used to temporarily treat a given character or string as a numeric value (similar to casting in C/C++).


Variables
Variables are symbols that can store string values. They can be used anywhere in the script and do not need to be declared although they are sometimes presented in a variable list with initially assigned values. As the name 'variable' suggests, the value of a variable can be assigned, and reassigned. Here are some examples:

LIST="" The symbol 'LIST' is assigned a null string
EXIST=true The symbol 'EXIST' is assigned the string value "true"
PROCESS=NO The symbol 'PROCESS' is assigned the string value "NO"
ERROR1="command not found" The string "command not found" is assigned to 'ERROR1'


The value contained in a variable can be accessed by preceeding the variable's name with a '$' character. For example, to print the value stored in 'Error1', the statement...
echo $ERROR1
...will send "command not found" to stdout. Note: the echo command automatically appends a new-line character at the end of the string. In cases where you do not want the new-line character, use printf like this...
printf "File already exists, overwrite? [y/n]: "
Use the '$' character anytime you need to output the value of a variable, but do not use it when assigning a value.


Tests
To control the flow of a program, the 'test' command can be used to check the value of a variable or condition. There are two ways to use 'test': 1) by using the word "test", or 2) by using its more readable macro ' [ '. If the square bracket method is used, the test condition must be followed by a closing ' ] ' bracket and must be surrounded by whitespaces. Anytime a math operator is used in a test, the values being tested are temporarily interpreted as numbers. If no math operator is present the test will be a string comparison. For example...
if [ $VAL -gt 0 ]
then
statement(s)
fi
...tests a numeric value, while...
if [ $MYVAR = "OK" ]
then
statement(s)
fi
...tests a string.


Conditional operators
The conditional operators are...

(Note the leading hyphen where present)
' = '
"is equal to" -- the whitespace on either side of the '=' sign is required. The whitespace characters used here, differentiate the condition from an assignment like STR="Exit" which contain no spaces around the '=' sign.
'-eq'
"is equal to", an alternative to the ' = ' sign described above.
'-lt'
"is less than", like '<' in C/C++.
'-gt'
"is greater than", like '>' in C/C++.
'-le'
"is less than or equal to", like '<=' in C/C++.
'-ge'
"is greater than or equal to", like '>=' in C/C++.
'-ne'
"is not equal to", like ' != ' in C/C++. ( ' != ' can also be used as long as whitespace is inserted after the first operand and before the second operand as with the ' = ' sign described above).


Math operations
All shell variables are strings. To perform a math operation on a string variable, its value must be converted to a number with the 'expr' command substitution function. For example...

count=10 
echo $count will print the string '10'
count=`expr $count + 2` (uses 'back tick' quote marks, see below)
echo $count will print the string '12'


Back ticks
The value of a variable can contain the output of a command using the backward single quote character (located to the left of the number '1' on standard keyboards). When a command is enclosed within these backward quotes (often called "back ticks"), its output can be trapped and assigned to a variable. For example, to set a string variable to the current date...

date=`date`

The value of 'date' will now be the string output by the Unix 'date' command. Therefore...

echo $date

...will display the same string as would result by typing 'date' at the command line, but here the string is trapped in a variable.


Flow control
Various control mechanisms can be used to control the flow of a shell program. Like in all languages, there are methods for testing, looping and iterating. The syntax of each control mechanism is shown in the examples below...

if
if [ $VAR = "OK" ]
then
statement(s)
elif [ $VAR = "NOT_OK" ]
statement(s)
else
statement(s)
fi

(Note: to end an 'if' block, close it with 'fi' -- "if" spelled backward.)

case
case $1 in
-p)
statement(s) ;; (each case separated by double colons)
-c)
statement(s) ;;
* ) (the 'default' case)
statement(s) ;;
esac (Note: to end a 'case' block, close it with 'esac' -- "case" spelled backward)


while
while test $var != "done" ; do
statement(s)
done


for
for element in $LIST
do
echo $element (example statement)
done


Notes:
Statements are normally placed one-per-line but can be included on the same line as long as there is a semi-colon terminator. For example, the syntax for an 'if' statement might be written like this...

if test $cond = "true" ; then

...instead of like this...

if test $cond = "true"
then (either method works, it's a matter of choice).

Similarly when using a while loop, the structure might be written...

while [ $num -gt 0 ] ; do


...instead of...

while [ $num -gt 0 ]
do (either method works, it's a matter of choice).

Remember that 'test' can be expressed with the word "test" or with the '[' ...']' bracket characters.


Quotation characters
since shell programming works mainly with strings, there are specific uses of quotation marks. String variables must be enclosed in quotes if they contain any whitespace like this...

echo "this sentence contains white space"

Strings that do not contain any whitespace do not need the quotes, but it is often a good idea to use them anyway.

The example case block above tests for the specific cases -p and -c. Since those symbols are strings they might also be written "-p" and "-c". It is sometimes useful to append or prefix an arbitrary character with the string for comparing null strings like this... if test "x$word" = "x" ; then ...In this example, if the value of 'word' is null it will be equal to the single character 'x' since 'word' was prefixed or concatenated with a single 'x'. When strings do not contain any whitespace, the statement... echo okay ...would have the same result as... echo "okay"
Input / Output
The keywords 'echo' and 'read' are used to send strings to stdout, and get strings from stdin respectively. When using the 'read' device, do not prefix the variable with a '$', for example... If a variable 'val' will be set by the user, do this... read val ...not this... read $val While there are other I/O devices, 'echo' and 'read' are the standard methods similar to 'cin >>' and 'cout << ' in C++ or 'scanf' and 'printf' in C.
Special variables
In addition to programmer-defined variables, a script can use a set of variables automatically inherited from the parent process that represent certain values unique to this (child process) script.

Some examples...

$?contains the exit value returned by the last executed command
$$contains the process ID number of the shell
$#contains the number of command line arguments sent to the script
$*contains the current argument list as individual tokens. The construct "$*" glues the argument list into a single string.
$1, $2, $3, ...etc.contain the individual argument strings sent from the command line.

Here is an example of how the $1, $2, ... variables work...

If a script is called like this:

myscript these are the args

then within the script, the special variables $1, $2, $3, and $4 will contain the following values:

$1 - "these"
$2 - "are"
$3 - "the"
$4 - "args"

The only numeric values available for command line arguments are 0 - 9. $0 contains the name of the script, while $1, $2, ... contain the argument strings. If more than nine arguments are entered at the command line and you need to use all of them, the values must be shifted to the left with the keyword 'shift' which reassigns $1 to what formerly was $2, and reassigns $2 to what formerly was $3, ... etc. Here is an example...

if the script body contained...

#!/bin/sh
while [ $# -gt 0 ] ; do
echo $1
shift
done

...and the command line entered was:
myscript one two three

the output would be...
one two three

The '$?' variable is useful for testing the exit value of the last-executed command. The 'exit value' of a Unix process is similar to the 'return value' of a C/C++ function or program. The value '0' zero, generally means "success" and '1' generally means failure although other values can be used. Exit values can be used within scripts, or simply from the shell.

or example, if you enter...
'date' at the command line

The Unix date command will execute normally.

If you then type...
echo #?

...an exit value of 0 will display. Since scripts can call other scripts, as well as standard Unix commands, it is often useful to examine exit values for success or failure.
Creating arrays of strings by concatenation
Just as command line arguments are an array of strings that can be used within a script, other arrays of strings can created and used as well. This is done by concatenation of whitespace-separated tokens.

For example...
FIELDS="Name Address Phone"
CONTACT=""

for element in $FIELDS ; do
printf "$element: "
read REPLY
CONTACT=$CONTACT $REPLY
done
echo $CONTACT

...would interactively collect one data record, concatenate it into a single string, and then display it.