Information Technology Reference
In-Depth Information
RS : The built-in variable RS contains the input record separator character. De-
fault: newline character. Note that one can set RS="\n\n" . In that case, the built-in
variable NR counts paragraphs, if the input text file is single-spaced.
Representation of Strings, Concatenation and Formatting the Output
Strings of characters in awk used in printing and as constant string-values are
simply framed by double quotes " . The special character sequences \\ , \" , \t and
\n represent the backslash, the double quote, the tab and the newline character in
strings respectively. Otherwise, every character including the blank just represents
itself.
Strings or the values of variables containing strings are concatenated by listing
the strings or variables separated by blanks. For example, "aa" "bb" represents the
same string as "aabb" .
A string (framed by double quotes " ) or a variable var containing string can be
printed using the statements ' print string ; 'or' print var; ' respectively. The state-
ment print; simply prints the pattern space. Using the print function for printing
is su cient for most purposes. However in awk , one can also use a second printing
function printf which acts similar to the function printf of the programming lan-
guage C. See [40, 31, 4, 5] for further details and consult the manual pages for awk
and printf for more information on printf . One may be interested in printf if one
wants to print the results of numerical computations, such as statistical evaluations
for further processing by a plotting program such as Mathematica [47] or gnuplot
[17].
Application (finding a line together with its predecessor in a text): The word “be-
cause” is invariably used incorrectly by Japanese learners of English. Because “be-
cause” is often used by Japanese learners of English to begin sentences (or sentence
fragments), it is necessary to not only print sentences containing the string Because
or because , but also to locate and print the preceding sentence as well. The follow-
ing program prints all lines in a file that match the pattern /[Bb]ecause/ as well as
the lines that precede such lines. We shall refer to it as printPredecessorBecause .
#!/bin/sh
# printPredecessorBecause
awk
'/[Bb]ecause/ { print previousLine "\n" $0 "\n\n" }
{ previousLine=$0 }'
$1
Explanation: The symbol/string $0 represents the entire line or pattern space in
awk . Thus, if the current line matches /[Bb]ecause/ , then it is printed following its
predecessor which was previously saved in the variable previousLine . Afterwards,
two newline characters are printed in order to structure the output. Should the first
line of the input file match /[Bb]ecause/ , then previousLine shall be automatically
initiated to the empty string such that the output starts with the first newline
character that is printed. Finally, every line is saved in the variable previousLine
waiting for the next input line and cycle.
Fields and Field Separators
In the default mode, the fields of an input line are the full strings of non-white
characters separated by blanks and tabs. They are addressed in the pattern space
from left to right as field variables $(1) , $(2) , ... $(NF) where NF is a built-in
Search WWH ::




Custom Search