Database Reference
In-Depth Information
index over a five-year period (from 8/29/94 to 8/27/99) and it has been obtained
from the Microsoft MoneyCentral Web site. In Last et al. [24], we have applied
signal-processing techniques to partition each series of daily stock values into a
sequence of intervals having distinct slopes (trends). An average of 15.64 intervals
per company has been identified. The classification problem has been defined as
predicting the correct length of the current interval based on the known
characteristics of the current and preceding intervals. Consequently, we have
converted every sequence of m intervals related to a specific stock into m - 1
interval-pairs, each containing information about two consecutive intervals. This
resulted in a total of 5462 records of interval-pairs. The candidate input attributes
include the duration, slope, and fluctuation measured in each interval, as well as
the major sector of the corresponding stock (a static attribute). The target
attribute, which is the duration of the second interval in a pair, has been discretized
to five subintervals of nearly equal frequency. These subintervals have been
labeled very short, short, medium, etc. To restore the original order of data arrival,
we have sorted the records by the starting date of each interval (we refer to this
data set as “Stock”).
4.3.2 Results - “Dropout” Data Set
This section summarizes five consecutive yearly periods processed by the IFN
algorithm, which have proven to produce stable data-mining models (Last et al.
[25]). The base assumption for this data set was that significant changes would be
observed over time, due to organizational changes, increasing demand for
technological degrees, etc. Table 4.2, Table 4.3, Table 4.4, and Fig. 4.1 describe
the outcomes of implementing the IFN algorithm on five consecutive years in the
database “Dropout” using the change-detection methodology to detect significant
changes that have occurred during these years.
Table 4.2. Results of the CD hypothesis testing on the “Dropout” database.
CD
Year
e M K-1 ,K
e M K-1 K-1
d
H(95%)
1 - Pvalue
1996
-
-
-
-
-
42.2%
17.4%
24.8%
6.5%
100.0%
1997
54.8%
31.2%
23.6%
6.2%
100.0%
1998
35.0%
29.9%
5.1%
5.4%
87.7%
1999
42.3%
21.9%
20.4%
4.9%
100.0%
2000
 
Search WWH ::




Custom Search