Information Technology Reference
In-Depth Information
is reached. If a
pattern
N
,(
N
= 1, 2, ...), is omitted, then the corresponding
action
N
is executed every time the program reaches that line of code in the cycle. If
{
action
N
}
is omitted, then the entire input line is printed by default-action. Observe that
by default an
awk
program does
not
copy/print an input line (similar to
sed -n
).
Thus, printing has to be triggered by an address
pattern
N
with no corresponding
action
N
, which selects the pattern space, or alternatively, printing can be triggered
by a separate
print
statement within
action
N
(similar to the operator
p
in
sed
).
Example:
The following program
numberOfLines
prints the number of lines
16
in a file.
#!/bin/sh
# numberOfLines
awk
'END {print NR}'
$1
numberOfLines
prints the built-in counter
NR
(number of records) at the end of the
file. By default setting, which can be changed, records are the lines of input delimited
by
newline
characters. The delimiter for records is stored in the built-in variable
RS
(record separator).
Application (counting the occurrence of word-patterns in text):
The following
program
countIonWords
counts the occurrence of words ending in “
ion
”inatext
file. Together with a search for the occurrence of words ending in “
ment
”, this gives
an indication of the usage of academic vocabulary in the text (
cf.
[13, 14])).
#!/bin/sh
# countIonWords
leaveOnlyWords $1| oneItemPerLine -| awk '/ion$/' -| numberOfLines -
Explanation:
The first two programs of the pipe deliver one word per line into
the pipe. The
awk
program
17
/ion$/
invokes the default action
print
the pattern
space (
i.e.
, line) for words ending (
$
)in“
ion
”. The lines which contain such words
are then counted by
numberOfLines
.
The Format of an
awk
Command
As shown above, any
awk
command has the following format:
pattern
{
action
};
The closing semicolon is optional. If a semicolon follows an
awk
command, then
another command can follow on the same line. The commands with the
BEGIN
and
the
END
pattern must be on separate lines.
If
pattern
matches the input record (pattern space), then
action
is carried out.
pattern
can be very similar to address patterns in
sed
. However, much more com-
plicated address patterns are possible in
awk
. Compare the listings in Appendix A.1
and Appendix A.2. An
action
is a sequence of statements that are separated by
semicolons
;
or are on different lines.
16
Consult the UNIX manual pages for
wc
in this regard (
i.e.
,type
man wc
).
17
Alternatively, the
sed
program
sed '/ion$/!d' -
could be used in the pipe.
/ion$/!d
does
not
(encoded by the negation operator
!
of
sed
)
delete
(using the
deletion operator
d
) a line (here: word) that ends in “
ion
”. Consult also
man grep
in this regard.
Search WWH ::
Custom Search