Data Quality Enhancement Technology to Improve Decision Support - Efficient Decision Support Systems: Practice and Challenges from Current to Future

Information Technology Reference

In-Depth Information

outliers can improve the quality of stored data. Isolating outliers may also have a positive

impact on the results of data analysis and data mining. Simple statistical estimates, like

sample mean and standard deviation can be significantly biased by individual outliers that

are far away from the middle of the distribution. In regression models, the outliers can affect

the estimated correlation coefficient [9]. Presence of outliers in training and testing data can

bring about several difficulties for methods of decision-tree learning, described by Mitchell

in [10] and parameters in Gaussian membership function parameters in [2]. For example,

using an outlying value of a predicting nominal attribute can unnecessarily increase the

number of decision tree branches associated with that attribute. In turn, this will lead to

inaccurate calculation of attribute selection criterion (e.g., information gain). Consequently,

the predicting accuracy of the resulting decision tree may be decreased. As emphasized in

[11], isolating outliers is an important step in preparing a data set for any kind of data

analysis.

1.2 Effective quality of data on technology of fuzzy system

Fuzzy systems are expressed by membership functions. The outlier and noise are kinds of

uncertainty which have effect on the membership function parameters, such as the Gaussian

membership. In Gaussian, there are two parameters, mean and standard deviation, which

are tuned based on the dataset. However, if the desired data is extracted from the dataset,

Mean and Standard deviation can be accurate parameters for the Gaussian membership.

Hence, to make a robust model, the outliers must be detected and the noisy data must be

removed from the dataset.

There is a direct, although rarely explored, relation between uncertainty of input data and

fuzziness expressed by Membership Functions (MFs). Various assumptions about the type

of input uncertainty distributions change the discontinuous mappings provided by crisp

logic systems into more smooth mappings that are implemented in a natural way by fuzzy

rules using specific types of MFs. On the other hand shifting uncertainty from fuzzy rules to

the input values may simplify logical rules, making the whole system easier to understand,

and allowing for easy control of the degree of fuzziness in the system [12].

If regions of the data of different classes are highly overlapping or if the data is noisy, the

values of the membership degrees could be misleading with respect to rule confidence if the

core region is modeled too small. In fact, we show that data regions with a high membership

degree need not to be the regions with a high rule confidence. This effect that we call

membership is unrobustness [2].

Therefore, the Fuzzy C-Mean clustering (FCM) is utilized to detect the outlier and statistic

equation is used to remove the noisy data in order to improve the quality of the data.

2. Design of method

As shown in Figure 1, the Weka software which was developed at Waikato University [13],

is used for pre-processing the data in the dataset. After cleaning the data, the FCM with

statistic equation (which is described in following section) were utilized to detect outliers,

remove noisy data and extract the desired data to get data of high quality. In the next step,

Type-1 FLS with gradient descent algorithm) were used to make a decision on such data,

after analyzing the data to decide on the parameters to be used, including temperature,

humidity and so on. The important part of this technique is that the gradient descent

Efficient Decision Support Systems: Practice and Challenges from Current to Future

Search WWH ::

Custom Search

Home