Information Technology Reference
In-Depth Information
length( string ) returns the length of string , i.e. , the number of characters in string .
index( bigstring , substring ) . Comment: This produces the position where substring
starts in bigstring .If substring is not contained in bigstring , then the value 0 is
returned. This allows analysis of fields beyond matching a substring.
substr( string , n 1 , n 2 ) . Comment: This produces the n th
1
through the n th
2
character
of string .If n 2
> length( string ) or if n 2
is omitted, then string is copied from the
n th
1
character to the end.
split( string ,arrayName,"c") . Comment: This splits string at every instance of
the separator character c into the array arrayName and returns the number of fields
encountered.
string = sprintf( format , expr1 , expr2 ... ) . Comment: This sets
string to what is produced by printf format , expr1 , expr2 ... In regard to the
printing function printf in awk or C consult [40, 31, 4, 5] and the manual pages for
awk and printf .
Application (generating strings of context from a file): The next important ex-
ample shows the use of the functions index() and substr() in awk . It generates all
possible sequences of consecutive words of a certain length in a file. We shall refer to
it as context . Suppose that a file $1 is organized in such a way that single words are
on individual lines ( e.g. , the output of a pipe leaveOnlyWords | oneItemPerLine ).
context uses two arguments. The first argument $1 is supposed to be the name of
the file that is organized as described above. The second argument $2 is supposed
to be a positive integer. context then generates “context” of length $2 out of $1 .
In fact, all possible sequences of length $2 of consecutive words in $1 are generated
and printed.
1: #!/bin/sh
2: # context
3: # First argument $1 is input file name.
4: # Second argument $2 is context-length.
5: awk
'BEGIN { cLength='$2'+0 }
6: NR==1
{ c=$0
}
7: NR>1 { c=c""$0}
8: NR>cLength { c=substr(c,index(c," ")+1) }
9: NR>=cLength { print c
}'
$1
Explanation: Suppose the above program is invoked as context sourceFile 11 .
Then, $2 =11. In line 5, the awk -variable cLength is set to 11. Thereby, the operation
+0 forces any string contained in the second argument $2 to context , even the empty
string, to be considered as a number in the remainder of the program. In the second
command of the awk program (line 6), the context c is set to the first word ( i.e. , input
line). In the third command (line 7), any subsequent word (input line) other than
the first is appended to c separated by a blank. The fourth statement (line 8) works
as follows: after 12 words are collected in c , the first is cut away by using the position
of the first blank, i.e. , index(c," ") , and reproducing c from index(c," ")+1 until
the end. Thus, the word at the very left of c is lost. Finally (line 9), the context c
is printed, if it contains at least 11 words cLength .
Search WWH ::




Custom Search