Information Technology Reference
In-Depth Information
There is no direct output by
sortByVowel
. It is clear how to generalize this pro-
cedure to a more significant analysis,
e.g.
, searches for specific patterns or searches
for phrases.
12.4 Extending the Capabilities with
awk
awk
is a simple programming language based on pattern recognition in the current
line of input and operations on chunks of that line. In that regard, it is very simi-
lar to
sed
. In contrast to
sed
,
awk
allows string-variables and numerical variables.
Consequently, one can accomplish with much more ease a variety of elaborate ma-
nipulations of strings in the current line of input that may depend, in particular,
on several previous lines of input. In addition, one can accomplish numerical oper-
ations on files such as accounting and keeping statistics of things (
cf.
the example
countFrequencies
in Section 12.2.2). Good introductions to
awk
are [4, 5, 30].
12.4.1 Overview of
awk
Programming and Its Applications
As mentioned above,
awk
is based on pattern recognition in the current line of input
and operations on chunks of that line.
awk
uses pattern recognition as addresses
similar to
sed
(
cf.
Section 12.3.1 and Appendix A.1). Furthermore,
awk
partitions
the current line of input (input record
15
) automatically in an array of “fields”. The
fields of the current line of input are usually the full strings of non-white characters
in the line. One typical use of
awk
is matching and rearranging the fields in a line
similar to the tagging in the substitution command of
sed
. However, the tagging and
reuse of tagged expressions in the substitution command of
sed
can usually only be
matched by rather complicated programming in
awk
.
The Format of an
awk
Program
Programs in
awk
need no compilation. An
awk
program looks like the following:
{
action
B
};
awk
'BEGIN
pattern
1
{
action
1
};
pattern
2
{
action
2
};
...
END
{
action
E
}'
action
B
is executed before the input file is processed.
action
E
is executed after the
input file is processed. The lines with
BEGIN
and
END
can be omitted.
Every line in the above program contains an
awk
command (ignoring the leading
string
awk '
and the trailing
'
). One can store a list of
awk
commands in a file (say)
awkCommands
and use
awk -f awkCommands targetFile
to execute the program on
targetFile
.
awk
operates on input records (usually lines) in a cycle just like
sed
.
action
1
is
executed if
pattern
1
matches the original input record. After that,
action
2
is executed
if
pattern
2
matches the current, possibly altered pattern space and the cycle was not
terminated by
action
1
. This continues until the second-to-last line of the
awk
program
15
This is
awk
-jargon. In
sed
-jargon, this was formerly called the pattern space.
Search WWH ::
Custom Search