Information Technology Reference
In-Depth Information
Application (average and standard deviation): The following program determines
the sum, average and standard deviation of the frequencies in a vector $1 .
#!/bin/sh
awk
'/[^
]/ {
s1+=$(NF);
s2+=$(NF)*$(NF)
}
END
{
print s1 ,
s1/NR ,
sqrt(s2*NR-s1*s1)/NR
}'
$1
Explanation: The awk program only acts on non-white lines since the non-white
pattern /[^ ]/ must be matched. s1 and s2 are initiated automatically to value 0
by awk . s1+=$(NF) adds the last field in every line to s1 . s2+=$(NF)*$(NF) adds the
square of the last field in every line to s2 . Thus, at the end of the program we have
s1 = N n =1 $(NF) n and s2 = N n =1 ( $(NF) n ) 2 .Inthe END -line, the sum s1 , the average
s1/NR and the standard deviation ( cf. [16, p. 81]) are printed.
Set Operations
In this section, we show how to implement set operations using awk . Set oper-
ations as well as vector operations are extremely useful in comparing results from
different analyses performed with the methods presented thus far.
Application (set intersection): The next program implements set intersection. 18
We shall refer to it as setIntersection .If aFile and bFile are organized such that
items (= set elements) are listed on separate lines, then it is used as setIntersection
aFile bFile . setIntersection can be used to measure overlap in use of vocabulary.
Consult also man comm .
#!/bin/sh
# setIntersection
awk
'FILENAME=="'$1'" { n[$0]=1; next };
n[$0]==1'
$1
$2
Explanation: awk can accept and distinguish more than one input file after the
program-string. This property is utilized here. Suppose this command is invoked as
setIntersection aFile bFile . This means $1 = aFile and $2 = bFile in the above.
As long as this awk program reads its first argument aFile , it only creates an
associative array n indexed by the lines $0 in aFile with constant value 1 for the
elements of the array. If the awk program reads the second file bFile , then only
those lines $0 in bFile are printed where the corresponding n[$0] was initiated to
1 while reading aFile . For elements which occur only in bFile , n[$0] is initiated
to 0 by the conditional which is then found to be false .
If one changes the final conditional n[$0]==1 in setIntersection to n[$0]==0 ,
then this implements set-complement. If such a procedure is named setComplement ,
then setComplement aFile bFile computes all elements from bFile that are not
in aFile .
18 Note that adjustBlankTabs fName | sort -u - converts any file fName into a
set where every element occurs only once. In fact, sort -u sorts a file and only
prints occurring lines once. Consequently, cat aFile bFile | adjustBlankTabs
- | sort -u - implements set union.
Search WWH ::




Custom Search