Databases Reference
In-Depth Information
b. For nominal attributes, hover over the cardinality value. When
hovered, if there are not too many different values, the unique values
will be displayed in a small pop-up box. Check that all values are
acceptable.
2. Use either the histogram or parallel coordinate plot to view the distributions
of each attribute. Look for outliers at the extremes of numeric attributes.
3. Some outlying observations may not contain extreme values with respect to
any single attribute, yet may still be outliers because they do not fit a
relationship pattern between two somewhat correlated attributes. To detect,
open the dataset in both a correlation matrix and scatter plot. The scatter
plot will automatically synchronize with the correlation matrix. In the
correlation matrix, click on each of the correlated attribute pairings to view
in scatter plot. Look for observations that do not fit the pattern of
correlation. For example, suppose that a dataset contains sales quantities
of winter coats by city. Included in the dataset are location attributes of the
city (latitude and longitude). The normal pattern would be for sales of
winter coats to increase as the latitude increases. A southern city with high
winter coat sales would be an outlier.
4. Use the parallel coordinate plot to restrict observations to only those
having a selected category value. It is possible that outliers will be visible
with respect to a single category, that are not visible when all observations
are viewed. For example, an observation with an attribute value of
Pregnant ¼ Yes would not be visibly detected when viewing all observa-
tions, yet would stand out when viewing only Gender ¼ Male
observations.
5. When datasets contain attributes that can be derived from other attrib-
utes, at least one of the attributes is redundant. Add a computed column to
the dataset that uses other attributes to duplicate the derived attribute.
Use the parallel coordinate plot or scatter plot to compare the newly
computed column to the derived attribute. If they don't match, there is a
problem in either the derived attribute or one of the attributes used to
compute its value.
Outlier isolation
Outliers, once detected using any of the VisMiner viewers, are best isolated or
eliminated using the parallel coordinate plot:
1. Eliminate observation(s) based on a single attribute outlier by dragging the
filter slider of the attribute past the outlying observation(s). Right-click on
slider to “Make dataset from filter”.
Search WWH ::




Custom Search