Database Reference
In-Depth Information
validation records and is not an equation), then it will be revealed as a significant
change.
The procedure has three major stages. The first is designed to perform an
initiation of procedure. The second stage is designated to detect a significant
change in the “patterns” (rules) of the prebuilt data-mining model, as described in
the previous section. The third stage is designated to evaluate whether one or some
variable(s) in the group of candidate attributes or target variable(s) (A and T) show
a significant change between periods.
The basic assumption for using the procedure is the availability of sufficient
data for each run of the algorithm on every period. If this assumption is not valid,
it is necessary to merge two or more periods to obtain statistically significant
outcomes.
4.3 Application Evaluation
4.3.1 Data Set Description
The method was proven useful when run on artificially generated data sets. The
method for change detection was also evaluated on several benchmark data sets
(Zeira et al. [39]).
An example of the implementation of the change-detection methodology is
illustrated in the first set of experiments, which were performed on a database
obtained from a network of colleges in Israel. This data set describes yearly (e.g.,
the time periods) dropouts of students from technicians and technical engineering
colleges (we refer to this data set as “Dropout”). The candidate attributes are:
regional area of the colleges (REGION), a discrete categorical variable; number of
divisions of studies in the institute (DIVISIONS), a discrete variable; number of
students in the institute (SUMP), a discretized variable where each value X
describes the interval
X ; average number of students in class
(AVEP), discretized to two intervals (low and high); percent of technological
reserve students in the institute (TR_PER), discretized to two intervals (high and
low); and class of students (CLASS), a discrete categorical variable (technicians
studies and technical engineers studies). The target factor (DROPOUT) describes
dropout percentage in the institute (high, low, negative). Dropout represents
students who have not finished their studies according to the pre-defined
curriculum of their class.
The “Dropout” database represents data for a five-year period. It is common
that due to organizational and social trends in the society, some changes in the
data-mining model are expected after the model becomes stable. Therefore, the
base assumption for this data set is that significant changes would be observed
over time.
The second set of experiments has been performed on a stock market data set,
initially used in Last et al. [24] for evaluation of the IFN algorithm. The raw data
represent the daily stock prices of 373 companies from the Standard & Poor's 500
[
40
(
1
),
40
X
]
Search WWH ::




Custom Search