Information Technology Reference
In-Depth Information
RS
: The built-in variable
RS
contains the input record separator character. De-
fault:
newline
character. Note that one can set
RS="\n\n"
. In that case, the built-in
variable
NR
counts paragraphs, if the input text file is single-spaced.
Representation of Strings, Concatenation and Formatting the Output
Strings of characters in
awk
used in printing and as constant string-values are
simply framed by double quotes
"
. The special character sequences
\\
,
\"
,
\t
and
\n
represent the backslash, the double quote, the tab and the newline character in
strings respectively. Otherwise, every character including the blank just represents
itself.
Strings or the values of variables containing strings are concatenated by listing
the strings or variables separated by blanks. For example,
"aa" "bb"
represents the
same string as
"aabb"
.
A
string
(framed by double quotes
"
) or a variable
var
containing
string
can be
printed using the statements '
print
string
;
'or'
print var;
' respectively. The state-
ment
print;
simply prints the pattern space. Using the
print
function for printing
is su
cient for most purposes. However in
awk
, one can also use a second printing
function
printf
which acts similar to the function
printf
of the programming lan-
guage C. See [40, 31, 4, 5] for further details and consult the manual pages for
awk
and
printf
for more information on
printf
. One may be interested in
printf
if one
wants to print the results of numerical computations, such as statistical evaluations
for further processing by a plotting program such as Mathematica [47] or
gnuplot
[17].
Application (finding a line together with its predecessor in a text):
The word “be-
cause” is invariably used incorrectly by Japanese learners of English. Because “be-
cause” is often used by Japanese learners of English to begin sentences (or sentence
fragments), it is necessary to not only print sentences containing the string
Because
or
because
, but also to locate and print the preceding sentence as well. The follow-
ing program prints all lines in a file that match the pattern
/[Bb]ecause/
as well as
the lines that precede such lines. We shall refer to it as
printPredecessorBecause
.
#!/bin/sh
# printPredecessorBecause
awk
'/[Bb]ecause/ { print previousLine "\n" $0 "\n\n" }
{ previousLine=$0 }'
$1
Explanation:
The symbol/string
$0
represents the entire line or pattern space in
awk
. Thus, if the current line matches
/[Bb]ecause/
, then it is printed following its
predecessor which was previously saved in the variable
previousLine
. Afterwards,
two
newline
characters are printed in order to structure the output. Should the first
line of the input file match
/[Bb]ecause/
, then
previousLine
shall be automatically
initiated to the empty string such that the output starts with the first
newline
character that is printed. Finally, every line is saved in the variable
previousLine
waiting for the next input line and cycle.
Fields and Field Separators
In the default mode, the fields of an input line are the full strings of non-white
characters separated by blanks and tabs. They are addressed in the pattern space
from left to right as field variables
$(1)
,
$(2)
, ...
$(NF)
where
NF
is a built-in
Search WWH ::
Custom Search