Information Technology Reference
In-Depth Information
P 1
phd-a-msc
business -intelligence
human-space-computing
Fig. 10.4 Displaying pattern ( P 1 ) w.r.t. DSM in Table 10.7
P 2
phd-a-msc
scholarships
management
Fig. 10.5 Displaying pattern ( P 2 ) w.r.t. DSM in Table 10.7
10.4 Method and Experimental Setup
The method used here is the integration of rule optimization framework presented in
[ 37 , 38 ] and structure-preserving flat representation of tree-structured data presented
in [ 14 ], which as a result will allow direct application of standard statistical measures
to tree-structured data. Figure 10.6 shows the proposed framework which in itself
describes the experimental process. The database structure model (DSM) [ 14 ]is
extracted from the tree-structured data/XML documents to preserve the structural
characteristics of the data. The extracted DSM is used to create the flat representation
of the tree structured data (shown in Fig. 10.6 with the square dash line region). An
example of the conversion process is given in Sect. 10.3 . Once the tree-structured
dataset has been converted to a flat table format (FDT), the dataset is then divided into
two parts. The first part is used for frequent pattern mining, statistical evaluation and
rule filtering process, while the second part acts as sample data drawn from the dataset
used to verify the accuracy and coverage of the discovered rules. In the pre-processing
phase, missing values are handled using common distribution-based missing value
imputation [ 27 ] and equal width binning approach is utilised to discretise the values
of any continuous attributes. The equal-width binning approach groups the data into
several buckets or bins of the same interval size. The equal width binning will be
implemented based on the following steps [ 35 ]: (1) Calculate the range of variable
 
Search WWH ::




Custom Search