Information Technology Reference
In-Depth Information
is reached. If a pattern N ,( N = 1, 2, ...), is omitted, then the corresponding action N
is executed every time the program reaches that line of code in the cycle. If { action N
} is omitted, then the entire input line is printed by default-action. Observe that
by default an awk program does not copy/print an input line (similar to sed -n ).
Thus, printing has to be triggered by an address pattern N with no corresponding
action N , which selects the pattern space, or alternatively, printing can be triggered
by a separate print statement within action N (similar to the operator p in sed ).
Example: The following program numberOfLines prints the number of lines 16
in a file.
#!/bin/sh
# numberOfLines
awk
'END {print NR}'
$1
numberOfLines prints the built-in counter NR (number of records) at the end of the
file. By default setting, which can be changed, records are the lines of input delimited
by newline characters. The delimiter for records is stored in the built-in variable RS
(record separator).
Application (counting the occurrence of word-patterns in text): The following
program countIonWords counts the occurrence of words ending in “ ion ”inatext
file. Together with a search for the occurrence of words ending in “ ment ”, this gives
an indication of the usage of academic vocabulary in the text ( cf. [13, 14])).
#!/bin/sh
# countIonWords
leaveOnlyWords $1| oneItemPerLine -| awk '/ion$/' -| numberOfLines -
Explanation: The first two programs of the pipe deliver one word per line into
the pipe. The awk program 17 /ion$/ invokes the default action print the pattern
space ( i.e. , line) for words ending ( $ )in“ ion ”. The lines which contain such words
are then counted by numberOfLines .
The Format of an awk Command
As shown above, any awk command has the following format:
pattern { action };
The closing semicolon is optional. If a semicolon follows an awk command, then
another command can follow on the same line. The commands with the BEGIN and
the END pattern must be on separate lines.
If pattern matches the input record (pattern space), then action is carried out.
pattern can be very similar to address patterns in sed . However, much more com-
plicated address patterns are possible in awk . Compare the listings in Appendix A.1
and Appendix A.2. An action is a sequence of statements that are separated by
semicolons ; or are on different lines.
16 Consult the UNIX manual pages for wc in this regard ( i.e. ,type man wc ).
17 Alternatively, the sed program sed '/ion$/!d' - could be used in the pipe.
/ion$/!d does not (encoded by the negation operator ! of sed ) delete (using the
deletion operator d ) a line (here: word) that ends in “ ion ”. Consult also man grep
in this regard.
Search WWH ::




Custom Search