Databases Reference
In-Depth Information
of it) to be related to a set of predictor variables in a manner similar to the model-
ing of a numeric response variable using linear regression. Generalized linear models
include logistic regression and Poisson regression.
Analysis of variance : These techniques analyze experimental data for two or more
populations described by a numeric response variable and one or more categorical
variables ( factors ). In general, an ANOVA (single-factor analysis of variance) problem
involves a comparison of k population or treatment means to determine if at least two
of the means are different. More complex ANOVA problems also exist.
Mixed-effect models : These models are for analyzing grouped data—data that can
be classified according to one or more grouping variables. They typically describe
relationships between a response variable and some covariates in data grouped
according to one or more factors. Common areas of application include multilevel
data, repeated measures data, block designs, and longitudinal data.
Factor analysis : This method is used to determine which variables are combined to
generate a given factor. For example, for many psychiatric data, it is not possible to
measure a certain factor of interest directly (e.g., intelligence); however, it is often
possible to measure other quantities (e.g., student test scores) that reflect the factor
of interest. Here, none of the variables is designated as dependent.
Discriminant analysis : This technique is used to predict a categorical response vari-
able. Unlike generalized linear models, it assumes that the independent variables
follow a multivariate normal distribution. The procedure attempts to determine
several discriminant functions (linear combinations of the independent variables)
that discriminate among the groups defined by the response variable. Discriminant
analysis is commonly used in social sciences.
Survival analysis : Several well-established statistical techniques exist for survival
analysis. These techniques originally were designed to predict the probability that
a patient undergoing a medical treatment would survive at least to time t . Methods
for survival analysis, however, are also commonly applied to manufacturing settings
to estimate the life span of industrial equipment. Popular methods include Kaplan-
Meier estimates of survival, Cox proportional hazards regression models, and their
extensions.
Quality control : Various statistics can be used to prepare charts for quality control,
such as Shewhart charts and CUSUM charts (both of which display group sum-
mary statistics). These statistics include the mean, standard deviation, range, count,
moving average, moving standard deviation, and moving range.
13.2.2 Views on Data Mining Foundations
Research on the theoretical foundations of data mining has yet to mature. A solid and
systematic theoretical foundation is important because it can help provide a coherent
 
Search WWH ::




Custom Search