Information Technology Reference
In-Depth Information
•
length(
string
)
returns the length of
string
,
i.e.
, the number of characters in
string
.
•
index(
bigstring
,
substring
)
. Comment: This produces the position where
substring
starts in
bigstring
.If
substring
is not contained in
bigstring
, then the value 0 is
returned. This allows analysis of fields beyond matching a substring.
•
substr(
string
,
n
1
,
n
2
)
. Comment: This produces the
n
th
1
through the
n
th
2
character
of
string
.If
n
2
>
length(
string
)
or if
n
2
is omitted, then
string
is copied from the
n
th
1
character to the end.
•
split(
string
,arrayName,"c")
. Comment: This splits
string
at every instance of
the separator character
c
into the array
arrayName
and returns the number of fields
encountered.
•
string = sprintf(
format
,
expr1
,
expr2 ...
)
. Comment: This sets
string
to what is produced by
printf
format
,
expr1
,
expr2 ...
In regard to the
printing function
printf
in
awk
or C consult [40, 31, 4, 5] and the manual pages for
awk
and
printf
.
Application (generating strings of context from a file):
The next important ex-
ample shows the use of the functions
index()
and
substr()
in
awk
. It generates all
possible sequences of consecutive words of a certain length in a file. We shall refer to
it as
context
. Suppose that a file
$1
is organized in such a way that single words are
on individual lines (
e.g.
, the output of a pipe
leaveOnlyWords | oneItemPerLine
).
context
uses two arguments. The first argument
$1
is supposed to be the name of
the file that is organized as described above. The second argument
$2
is supposed
to be a positive integer.
context
then generates “context” of length
$2
out of
$1
.
In fact, all possible sequences of length
$2
of consecutive words in
$1
are generated
and printed.
1: #!/bin/sh
2: # context
3: # First argument $1 is input file name.
4: # Second argument $2 is context-length.
5: awk
'BEGIN { cLength='$2'+0 }
6: NR==1
{ c=$0
}
7: NR>1 { c=c""$0}
8: NR>cLength { c=substr(c,index(c," ")+1) }
9: NR>=cLength { print c
}'
$1
Explanation:
Suppose the above program is invoked as
context sourceFile 11
.
Then,
$2
=11. In line 5, the
awk
-variable
cLength
is set to 11. Thereby, the operation
+0
forces any string contained in the second argument
$2
to
context
, even the empty
string, to be considered as a number in the remainder of the program. In the second
command of the
awk
program (line 6), the context
c
is set to the first word (
i.e.
, input
line). In the third command (line 7), any subsequent word (input line) other than
the first is appended to
c
separated by a blank. The fourth statement (line 8) works
as follows: after 12 words are collected in
c
, the first is cut away by using the position
of the first blank,
i.e.
,
index(c," ")
, and reproducing
c
from
index(c," ")+1
until
the end. Thus, the word at the very left of
c
is lost. Finally (line 9), the context
c
is printed, if it contains at least 11 words
cLength
.
Search WWH ::
Custom Search