Information Technology Reference
In-Depth Information
Application (scalar multiplication): The next program scalarMultiplica - tion
implements scalar multiplication. If aFile is a vector and n is a number, then it is
used as scalarMultiplication aFile n .
#!/bin/sh
# scalarMultiplication
# First argument $1 is vector. Second argument $2 is scalar.
awk
'{
$(NF)*=('$2'+0) ;
print
}'
$1
Explanation: The scalar $2 is spliced into the awk program by sh using string
concatenation of the strings ' { $(NF)*=(' , the content of the second argument of
the command $2 and '+0) ; print } ' . Then, for every input line of the file $1 ,
every frequency in the vector which is stored in $(NF) is multiplied by the scalar
and the resulting pattern space is printed.
Application (absolute value): The next program computeAbsoluteValue com-
putes the absolute value of the frequencies of items in a vector.
#!/bin/sh
# computeAbsoluteValue
awk
'$(NF)<0 { $(NF)=-$(NF) } ;
{ print }'
$1
Explanation: If the last field of an input line is negative, then its sign is reversed.
Next, every line is printed.
Application (sign function): Like the previous program, the next program
frequencySign computes the sign of the frequencies of items in a vector.
#!/bin/sh
# frequencySign
awk
'$(NF)>0 {$(NF)=1}; $(NF)<0 {$(NF)=-1}; {print}'
$1
Application (selecting frequencies): The next program cuts away low frequencies
from a file $1 that is a vector. The limit value $2 is the second argument to the
program. We shall refer to it as filterHighFrequencies . It can be used to gain files
with very common words that are functional in the grammatical sense but not in
regard to the context.
#!/bin/sh
# filterHighFrequencies
# First argument $1: vector. Second argument $2: cut-off threshold.
awk
'$(NF)>='$2
$1
Explanation: $2 stands for the second argument to filterHighFrequencies .
If this program is invoked with filterHighFrequencies fname 5 , then sh passes
$(NF)>=5 as selecting address pattern to awk . Consequently, all lines of fname where
the last field is larger than or equal to 5 are printed.
Application: The vector operations presented above allow to analyse and com-
pare, e.g. , vocabulary use of students in a class in a large variety of ways (vocab-
ulary use of a single student vs. the class or vs. a dedicated list of words, sim-
ilarity/distinction of vocabulary use among students, computation of probalility
distributions over vocabulary use (normalization), etc.).
Search WWH ::




Custom Search