Information Technology Reference
In-Depth Information
5.4
Heuristic 4: Prefer Long over Short Strings - But Not Too Long
This heuristic is already implicit in the first three. Lengthening existing strings
is the main operation through which coded strings are tested to rule out over-
committed classifications, to assure conservative codings, and to diversify strings
across different contexts. By definition, longer strings absorb more context than
shorter strings. This makes them useful lenses for understanding the commit-
ments of shorter strings. This fourth heuristic, incidentally, is built-into the
string matcher algorithm, which, from the same starting point, leaps over shorter
matches if longer ones are available. Notice in Table 2 how lengthening the mo-
tion word jump makes it feasible to understand a range of functions that one
could not have easily predicted from the single word.
Table 2.
Variant Lengthening off the word “Jump” Produces Different Functions.
While jumping on the walls
continuous motion
She jumped at the opportunity
positive standard
They jumped to the conclusion
negative standard
He jumped around the house
motion
He'll get a jump on the problem
positive standard
They jumped all over him
negative affect
He is good at jumping rope
generic motion
I've jumped around the country
autobiographical
He jumped around the country
scene shift
Still, long is not always better, especially when the frequency of recurrence of
a long string approximates zero. Before admitting a string, we queried ourselves
whether the string had a chance of re-use across other texts and writers. If we
could not answer this re-use question positively, we did not include it.
6
Identifying Genres:
Exploring Language and Culture in the Tech Review
Using these heuristics, DocuScope has been developed into a text visualization
and analysis environment containing a catalog of over 300 million strings or-
ganized as shown in Table 3. At the highest level, these strings fall into three
distinct clusters; at the lowest level, they are divided into a little over 140 classes;
at the mid-level, there are 18 dimensions. This hierarchical structure is, in ef-
fect, a multivariate model of text with the potential power to articulate and even
improve upon rhetorical reading in genre analysis and discovery.
6.1
Reading and Multivariate Models
It is our contention that the rhetorical reader approaches the task of genre se-
lection as a serial task with underlying multivariate components similar to those
Search WWH ::




Custom Search