Information Technology Reference
In-Depth Information
discriminate analysis picked out 35 of them, 73%. For the 64 we thought were
non-reviews, it picked out 57 or 89%. DocuScope's analysis also classified 7 texts
as tech reviews that we did not so classify. It also left out 13 texts that we did
so classify. This produced a total of 20 apparently misclassified texts from the
original set of 112, an apparent error rate of 18%.
For a rhetorician wanting to identify examples of a particular genre in a large
archive, these results are promising. Overall, DocuScope identified 42 texts in a
set of 112 texts as tech reviews, eliminating 70 from consideration. Of these 42,
most were identified as tech reviews by our rhetorical judgment. Presumably a
bit more rhetorical judgment would eliminate the 7 texts DocuScope erroneously
added. Of the 70 taken out of consideration by DocuScope, most did not appear
to warrant further consideration; but there are 13 misclassified texts that might
have benefited from further examination.
Thus if we want to be absolutely to identify every tech review in an archive,
DocuScope appears to have its limits. If this test is any indication, there are
some texts (about 10% of the original set) that it is not going to include. But if
all we need is a large group of texts that appear to be tech reviews, DocuScope
looks good: It gives us a few texts that we will end up throwing out in the long
run, but it has also cut down from 112 to 42 the number of texts we need to look
at in the first place. It has thereby taken a task that originally looked impossible
for the rhetorician and made it more manageable, though still large.
The story of DocuScope's performance does not, however, stop with this
analysis. Part of the problem with using rhetorical intuition to identify members
of a particular genre in an archive is that the rhetorician is a fallible creature.
Faced with 112 potential reviews and limited time, we did not do a thorough
rhetorical analysis of each text to make up our minds how to classify it. Instead,
we looked at the abstracts, read through the first paragraphs, and skimmed
through the rest to make our decision. Despite this cursory review, the task
took the better part of an afternoon.
To get a sense of how well we had done under these conditions, we took a
second look at the twenty texts that DocuScope had apparently misclassified.
Reading more carefully, looking for elements of mixed genres, we classified each
text a second time according to its predominant elements. This secondary review
resulted in changing our minds in 9 out of 20 cases. Two of the texts that we
had classified as non-reviews, we decided were actually reviews; seven of those
we had classified as reviews, we decided were non-reviews. When we factored
these corrections in, our agreement with DocuScope rose from 82% to 90%, with
DocuScope and rhetorical intuition disagreeing in the final analysis on only 11
texts, less than 10% of the original sample.
6.3
Understanding Underlying Language and Culture
in the Other Journalistic Pieces
So far, we have suggested that DocuScope may be more ecient and even more
accurate than the intuition of the rhetorical analyst working unaided. These
results are all relevant to getting the appropriate texts into the hands of the
Search WWH ::




Custom Search