Information Technology Reference
In-Depth Information
There is no direct output by sortByVowel . It is clear how to generalize this pro-
cedure to a more significant analysis, e.g. , searches for specific patterns or searches
for phrases.
12.4 Extending the Capabilities with awk
awk is a simple programming language based on pattern recognition in the current
line of input and operations on chunks of that line. In that regard, it is very simi-
lar to sed . In contrast to sed , awk allows string-variables and numerical variables.
Consequently, one can accomplish with much more ease a variety of elaborate ma-
nipulations of strings in the current line of input that may depend, in particular,
on several previous lines of input. In addition, one can accomplish numerical oper-
ations on files such as accounting and keeping statistics of things ( cf. the example
countFrequencies in Section 12.2.2). Good introductions to awk are [4, 5, 30].
12.4.1 Overview of awk Programming and Its Applications
As mentioned above, awk is based on pattern recognition in the current line of input
and operations on chunks of that line. awk uses pattern recognition as addresses
similar to sed ( cf. Section 12.3.1 and Appendix A.1). Furthermore, awk partitions
the current line of input (input record 15 ) automatically in an array of “fields”. The
fields of the current line of input are usually the full strings of non-white characters
in the line. One typical use of awk is matching and rearranging the fields in a line
similar to the tagging in the substitution command of sed . However, the tagging and
reuse of tagged expressions in the substitution command of sed can usually only be
matched by rather complicated programming in awk .
The Format of an awk Program
Programs in awk need no compilation. An awk program looks like the following:
{ action B };
awk
'BEGIN
pattern 1
{ action 1 };
pattern 2
{ action 2 };
...
END
{ action E }'
action B is executed before the input file is processed. action E is executed after the
input file is processed. The lines with BEGIN and END can be omitted.
Every line in the above program contains an awk command (ignoring the leading
string awk ' and the trailing ' ). One can store a list of awk commands in a file (say)
awkCommands and use awk -f awkCommands targetFile to execute the program on
targetFile .
awk operates on input records (usually lines) in a cycle just like sed . action 1 is
executed if pattern 1 matches the original input record. After that, action 2 is executed
if pattern 2 matches the current, possibly altered pattern space and the cycle was not
terminated by action 1 . This continues until the second-to-last line of the awk program
15 This is awk -jargon. In sed -jargon, this was formerly called the pattern space.
Search WWH ::




Custom Search