Shell scripting is an extremely powerful force sometimes overlooked by
programmers. I think this is due to the simplistic procedural approach it
presents. Programmers tend to place their trust in complex interpreters
implementing complex paradigms like OOP or functional programming. The success
of Python and Perl for example, in the system administration circles, is
undeniable. Yet you'll often hear old Unix folks chuckle in their beards when
they see the young (well intended) sysadmin defining lengthy classes or
importing heavy modules.
programmers. I think this is due to the simplistic procedural approach it
presents. Programmers tend to place their trust in complex interpreters
implementing complex paradigms like OOP or functional programming. The success
of Python and Perl for example, in the system administration circles, is
undeniable. Yet you'll often hear old Unix folks chuckle in their beards when
they see the young (well intended) sysadmin defining lengthy classes or
importing heavy modules.
The truth is, despite its simplistic aspects, shell scripting is just like any
other Unix tool. That is, extremely powerful. The following is a list of tips
and good practices I have gathered over the course of the past year (and a half,
or so). They are primarily true for bash scripting (bash being the de facto
standard shell on any Linux distribution), but are also valid for any other
shell and scripting in general.
Disclaimer: This is not a scripting tutorial. I assume that you are
already familiar with the syntax and features of at least one shell, preferably bash.
other Unix tool. That is, extremely powerful. The following is a list of tips
and good practices I have gathered over the course of the past year (and a half,
or so). They are primarily true for bash scripting (bash being the de facto
standard shell on any Linux distribution), but are also valid for any other
shell and scripting in general.
Disclaimer: This is not a scripting tutorial. I assume that you are
already familiar with the syntax and features of at least one shell, preferably bash.
Unix tools
Scripting can (and should) take great advantage of the already existing tools at
your disposal. If you're working on Unix, there are a few common tools you're
fairly certain of coming across. This is why my first advice is to familiarize
yourself with the following tools, and when to use them:
your disposal. If you're working on Unix, there are a few common tools you're
fairly certain of coming across. This is why my first advice is to familiarize
yourself with the following tools, and when to use them:
- find: When applying a command on multiple files.
- grep: When detecting an event written in files.
- sed: When modifying streams like files or commands outputs.
- awk: When extracting data from a stream.
Awk in particular is a full-blown programming language that excels at file
manipulation and event detection.
Moreover, learn to use tools like cut or tr. A good script will
always leverage the power of the existing instead of re-inventing the wheel
uselessly.
Factor your code into functions
You'd think that I shouldn't have to write this. I don't understand the dark
magic that makes even the most rigorous programmers lose common sense when
approaching a script. The software industry is filled with poorly written
scripts that overlook even the simplest concept of "avoiding useless (and
dangerous) repetition through the use of functions".
Effectively, modern shells make it very easy to create your own set of generic
functions and importing it into scripts with easy sourcing.
However, more often than none, you won't find ready-made functions easily. Sure
when you are looking for something specific, you may come across very
interesting libraries (like that JSON parser in bash). Most of the time, you
want to implement libraries that are bounded by the specificity of your
environment. They are functions that are not meant to be used outside the scope
of your job, but are heavily use inside it.
magic that makes even the most rigorous programmers lose common sense when
approaching a script. The software industry is filled with poorly written
scripts that overlook even the simplest concept of "avoiding useless (and
dangerous) repetition through the use of functions".
Effectively, modern shells make it very easy to create your own set of generic
functions and importing it into scripts with easy sourcing.
However, more often than none, you won't find ready-made functions easily. Sure
when you are looking for something specific, you may come across very
interesting libraries (like that JSON parser in bash). Most of the time, you
want to implement libraries that are bounded by the specificity of your
environment. They are functions that are not meant to be used outside the scope
of your job, but are heavily use inside it.
Real-life example taken from my job
The naming convention of machines at work follows a simple
schema: [dtp]host[0-9][0-9].
The first letter stands for :
schema: [dtp]host[0-9][0-9].
The first letter stands for :
- d: development machines
- t: testing machines
- p: production machines
In some cases we have to append a suffix to the machine name.
Machine dhost09 becomes dhost09.dev
thost43 becomes thost43.test
phost01 becomes phost01.prd
The following bash function will never be used outside the boundaries of my
office, but it sure fills our scripts at work:
append_suffix () {
sed -e 's/\([dtp]\)\(.*)/\1\2.\1/'
-e 's/d$/dev/'
-e 's/t$/test/'
-e 's/p$/prd/' <(print $1)
}
My point is simple: Detect repetition points in your scripts, factor them into
functions, share them and force them upon your team members. Your enviroment can
only get better.
getopts and Usage() function
A good script, just like any good Unix tool, must present a solid command line
interface, and that means presenting dashed options. Both bash and ksh implement
a call to the getopts function and make sure to abuse it.
Also, remember that good tools are self documenting. Write a function called
Usage() that will describe how to use the script. Something like this:
interface, and that means presenting dashed options. Both bash and ksh implement
a call to the getopts function and make sure to abuse it.
Also, remember that good tools are self documenting. Write a function called
Usage() that will describe how to use the script. Something like this:
usage() {
print "Usage:"
print "script.sh [-h] [-v] -f file.txt -c command"
}
Finally be consistent in the way this function is called. I was part of a team
that consistently called the Usage() function when the script is called without
arguments. I was part of another team that calls it with "-h" option.
It doesn't matter what the convention is, the point is that you should be able
to go back to your script a year after you last used it and be able to pick up
its use as fast as possible.
The Shebang #!
The first line of the script, it defines the executable called to execute the
script. It goes without saying, always define a shebang in your script.
You'll thank me once you have multiple shells installed on your machine.
Don't ever make any assumption about the shell used. You should know which shell
is used on your system and the path to your executable.
script. It goes without saying, always define a shebang in your script.
You'll thank me once you have multiple shells installed on your machine.
Don't ever make any assumption about the shell used. You should know which shell
is used on your system and the path to your executable.
Note about Linux systems
On Linux systems, bash has become the de facto shell. In order to avoid breaking
backward compatibility, the old shell sh is now a simple symbolic link to
bash.
That does not mean you should put it in your shebangs!
Some old tutorials and teachers would recommend writing a shebang like this:
#!/bin/sh
This works on the assumption that /bin/sh will always be a symbolic link. Avoid
that. It makes your script less portable. Specify your shell. Directly.
The use of temporary files
You may be tempted to use temp files. After you learn some piping and shell
redirection, you may be tempted to avoid temp files altogether. It is true that
writing to disc can be a performance overhead that you may want to avoid. But
you also want to keep in mind the readability and maintainability of the script.
No matter what the reason is, try avoiding writing something like this
redirection, you may be tempted to avoid temp files altogether. It is true that
writing to disc can be a performance overhead that you may want to avoid. But
you also want to keep in mind the readability and maintainability of the script.
No matter what the reason is, try avoiding writing something like this
export TMPFILE="tmpfile.txt"
Always prefer using the mktemp(1) function. It will create a temporary
file with a unique filename. By default the temp file will be in the /tmp
directory, so you won't really need to worry about cleaning it up. This good
practice enhances portability of your script to another system. It also makes
sure your script won't break because of overlapping names.
Avoid UUoC
UUoC is short for Useless Use of Cat. Cat is a unix tool that is meant to
concatenate files. 99% of the time you use cat on a single file, there's a
better more efficient way to do it. The most common example I find is:
concatenate files. 99% of the time you use cat on a single file, there's a
better more efficient way to do it. The most common example I find is:
cat file | grep pattern
that can be easily replaced by:
grep pattern file
Unlike the first one, you are replacing 2 processes and a pipe with one single
process. The difference in performance may not be as important as it was 10
years ago, but it doesn't justify the bad practice of forking useless process.
How to kill a process
Repeat after me: kill -9 should be a last resort. Repeat it one more time. Out
loud.
I have seen far too many people rush into the use of a -9 (SIGKILL). "It's
faster" they say. The truth is, that's how you should proceed. SIGKILL is not
caught. That means that open file handlers won't be closed. Caches won't be
flushed. Generaly speaking, the exit won't be graceful and will leave dirty
stains all over the place.
From this old Usenet post
No no no. Don't use kill -9.
It doesn't give the process a chance to cleanly:
1) shut down socket connections
2) clean up temp files
3) inform its children that it is going away
4) reset its terminal characteristics
and so on and so on and so on.
Generally, send 15, and wait a second or two, and if that doesn't
work, send 2, and if that doesn't work, send 1. If that doesn't,
REMOVE THE BINARY because the program is badly behaved!
Don't use kill -9. Don't bring out the combine harvester just to tidy
up the flower pot.
loud.
I have seen far too many people rush into the use of a -9 (SIGKILL). "It's
faster" they say. The truth is, that's how you should proceed. SIGKILL is not
caught. That means that open file handlers won't be closed. Caches won't be
flushed. Generaly speaking, the exit won't be graceful and will leave dirty
stains all over the place.
From this old Usenet post
No no no. Don't use kill -9.
It doesn't give the process a chance to cleanly:
1) shut down socket connections
2) clean up temp files
3) inform its children that it is going away
4) reset its terminal characteristics
and so on and so on and so on.
Generally, send 15, and wait a second or two, and if that doesn't
work, send 2, and if that doesn't work, send 1. If that doesn't,
REMOVE THE BINARY because the program is badly behaved!
Don't use kill -9. Don't bring out the combine harvester just to tidy
up the flower pot.
Don't parse the output of ls
A great thing about shells is looping. Sometimes, you will be tempted to do
something like that:
something like that:
# Do `command` on all files starting with i
for f in $(ls i*); do
command "$f"
Avoid this as much as possible. The loop will break if the ls command comes
across a "funky" character like "\n" or a space in the name of the file. The
above example can be replaced by a call to find, like this:
find . -name i\* -exec command {} \;
0 comments:
Post a Comment