Information Technology Reference
In-Depth Information
Application (average and standard deviation):
The following program determines
the sum, average and standard deviation of the frequencies in a vector
$1
.
#!/bin/sh
awk
'/[^
]/ {
s1+=$(NF);
s2+=$(NF)*$(NF)
}
END
{
print s1 ,
s1/NR ,
sqrt(s2*NR-s1*s1)/NR
}'
$1
Explanation:
The
awk
program only acts on non-white lines since the non-white
pattern
/[^ ]/
must be matched.
s1
and
s2
are initiated automatically to value 0
by
awk
.
s1+=$(NF)
adds the last field in every line to
s1
.
s2+=$(NF)*$(NF)
adds the
square of the last field in every line to
s2
. Thus, at the end of the program we have
s1
=
N
n
=1
$(NF)
n
and
s2
=
N
n
=1
(
$(NF)
n
)
2
.Inthe
END
-line, the sum
s1
, the average
s1/NR
and the standard deviation (
cf.
[16, p. 81]) are printed.
Set Operations
In this section, we show how to implement set operations using
awk
. Set oper-
ations as well as vector operations are extremely useful in comparing results from
different analyses performed with the methods presented thus far.
Application (set intersection):
The next program implements set intersection.
18
We shall refer to it as
setIntersection
.If
aFile
and
bFile
are organized such that
items (= set elements) are listed on separate lines, then it is used as
setIntersection
aFile bFile
.
setIntersection
can be used to measure overlap in use of vocabulary.
Consult also
man comm
.
#!/bin/sh
# setIntersection
awk
'FILENAME=="'$1'" { n[$0]=1; next };
n[$0]==1'
$1
$2
Explanation:
awk
can accept and distinguish more than one input file after the
program-string. This property is utilized here. Suppose this command is invoked as
setIntersection aFile bFile
. This means
$1
=
aFile
and
$2
=
bFile
in the above.
As long as this
awk
program reads its first argument
aFile
, it only creates an
associative array
n
indexed by the lines
$0
in
aFile
with constant value 1 for the
elements of the array. If the
awk
program reads the second file
bFile
, then only
those lines
$0
in
bFile
are printed where the corresponding
n[$0]
was initiated to
1 while reading
aFile
. For elements which occur only in
bFile
,
n[$0]
is initiated
to 0 by the conditional which is then found to be
false
.
If one changes the final conditional
n[$0]==1
in
setIntersection
to
n[$0]==0
,
then this implements set-complement. If such a procedure is named
setComplement
,
then
setComplement aFile bFile
computes all elements from
bFile
that are not
in
aFile
.
18
Note that
adjustBlankTabs fName | sort -u -
converts any file
fName
into a
set where every element occurs only once. In fact,
sort -u
sorts a file and only
prints occurring lines once. Consequently,
cat aFile bFile | adjustBlankTabs
- | sort -u -
implements set union.
Search WWH ::
Custom Search