Information Technology Reference
In-Depth Information
Variables and Arrays
A variable-name or array-name in
awk
is a string of letters. An array-entry has
the format
arrayName[
index
]
.The
index
in
arrayName
is simply a string which is
a very flexible format,
i.e.
, any array is by default an associative array and not
necessarily a linear array indexed by integers. A
typical example
for use of an
associative array showing the power of this concept can be found in Section 12.2.2
of this chapter (
countFrequencies
). Numbers are simultaneously understood as
strings in
awk
. All variables or entries of an array that are used are automatically
initiated to the empty string. Any string has numerical value zero (0).
Built-In Variables
awk
has a number of built-in variables some of which have already been intro-
duced. In the next few paragraphs, we shall list the most useful ones. The reader is
refered to [4, 5, 40] or the UNIX manual pages for
awk
for a complete listing.
FILENAME
: The built-in variable
FILENAME
contains the name of the current input
file.
awk
can distinguish the standard input
-
as the name of the current input file.
Using a pattern such as
FILENAME==fName
(
cf.
appendix A.2), processing by
awk
can
depend upon one of several input files that follow the
awk
program as arguments and
are being processed in the order listed from left to right (
e.g.
,
awk '
awkProgram
'
fileOne fileTwo fileLast
). See the listing of the program
setIntersection
below
in Section 12.4.2.3 for a typical use of
FILENAME
.
FS
: The built-in variable
FS
contains the field separator character. Default: se-
quences of blanks and tabs. For example, the variable
FS
should be reset to
&
(sep-
arator for tables in T
E
X), if one wants to partition the input line in regard to fields
separated by
&
. Such a resetting action happens often in
action
B
matched by the
BEGIN
pattern at the start of processing in an
awk
program.
NF
: The built-in variable
NF
contains the number of fields in the current pattern
space (input record). This is very important in order to loop over all fields in the
pattern space using the
for
-loop construct of
awk
. A typical loop is given by:
for(counter=1;counter<=NF;counter++)
{
actionWith
(
counter
)
}
.
See the listing of the program
findFourLetterWords
below for a typical use of
NF
.
Note that
NF
can be increased to “make room” for more fields which can be filled
with results of the current computation in the cycle.
NR
: The built-in variable
NR
contains the number of the most recent input record.
Usually, this is the line number if the record separator character
RS
is not reset or
NR
itself is not reassigned another value. See the listing of the program
context
below
for a typical use of
NR
.
OFS
: The built-in variable
OFS
contains the output field separator used in
print
.
Default: blank.
OFS
is caused to be printed if a comma “
,
”isusedina
print
statement. See the listing of the program
firstFiveFieldsPerLine
below for an
application.
ORS
: The built-in variable
ORS
contains the output record separator string. It
is appended to the output after each
print
statement. Default:
newline
-character.
ORS
can be set to the empty string through
ORS=""
. In that case, output lines are
concatenated. If one sets
ORS="\n\n"
,
i.e.
, two newline characters (see next section),
then the output is double-spaced. See the listing of the
awk
program in section 12.5.2
for an application.
Search WWH ::
Custom Search