Graphics Reference
In-Depth Information
Figure . . Scagnostics
the Charles River (CHAS) and proportion of residential land zoned for lots over
t .(ZN).hescagnosticOutlyingmeasureflaggedthefewcasesthatbounded
the Charles River. he locations of this scatterplot point in the other scagnostics
SPLOM characterize the plot as relatively skewed, skinny, striated, and stringy, but
not convex.
Sequence Analysis
5.5.2
Asequence isalistofobjects,e.g.
.heorderingofthelistisgiven byanorder
relation. Inmanyapplications of sequence analysis, objects arerepresented bytokens
and sequences are represented by strings of tokens. In biosequencing, for example,
the letters A, C, T and G are used to represent the four bases in a DNA strand.
Suppose we are given a length n string of tokens and want to find the most fre-
quently occurring substrings of length m in the string (m
x, y, z
n). A simple (not espe-
cially fast) algorithm to do this involves generating candidate substrings and testing
them against the target string. We begin with strings of length , each comprised of
adifferenttoken.henwebuildcandidatesubsequencesoflength .Wecountthefre-
quency of each of these subsequences in the target string. Using any of these length
subsequences with a count greater than zero, we build candidate subsequences of
length . We continue the generate-and-test process until we have tested the can-
didates of length m or until all counts are zero. his stepwise procedure traverses
asubsetofthebranchesofthetreeofallpossiblesubsequencessowedonothaveas
many tests to perform.
Embedding a sequence analysis in a graph layout oten gives us a simple way to
visualize these subsequences. he layout may be based on known coordinates (as in
geographic problems) or on an empirical layout using adjacency in the sequence list
l
Search WWH ::




Custom Search