Information Technology Reference
In-Depth Information
outliers can improve the quality of stored data. Isolating outliers may also have a positive
impact on the results of data analysis and data mining. Simple statistical estimates, like
sample mean and standard deviation can be significantly biased by individual outliers that
are far away from the middle of the distribution. In regression models, the outliers can affect
the estimated correlation coefficient [9]. Presence of outliers in training and testing data can
bring about several difficulties for methods of decision-tree learning, described by Mitchell
in [10] and parameters in Gaussian membership function parameters in [2]. For example,
using an outlying value of a predicting nominal attribute can unnecessarily increase the
number of decision tree branches associated with that attribute. In turn, this will lead to
inaccurate calculation of attribute selection criterion (e.g., information gain). Consequently,
the predicting accuracy of the resulting decision tree may be decreased. As emphasized in
[11], isolating outliers is an important step in preparing a data set for any kind of data
analysis.
1.2 Effective quality of data on technology of fuzzy system
Fuzzy systems are expressed by membership functions. The outlier and noise are kinds of
uncertainty which have effect on the membership function parameters, such as the Gaussian
membership. In Gaussian, there are two parameters, mean and standard deviation, which
are tuned based on the dataset. However, if the desired data is extracted from the dataset,
Mean and Standard deviation can be accurate parameters for the Gaussian membership.
Hence, to make a robust model, the outliers must be detected and the noisy data must be
removed from the dataset.
There is a direct, although rarely explored, relation between uncertainty of input data and
fuzziness expressed by Membership Functions (MFs). Various assumptions about the type
of input uncertainty distributions change the discontinuous mappings provided by crisp
logic systems into more smooth mappings that are implemented in a natural way by fuzzy
rules using specific types of MFs. On the other hand shifting uncertainty from fuzzy rules to
the input values may simplify logical rules, making the whole system easier to understand,
and allowing for easy control of the degree of fuzziness in the system [12].
If regions of the data of different classes are highly overlapping or if the data is noisy, the
values of the membership degrees could be misleading with respect to rule confidence if the
core region is modeled too small. In fact, we show that data regions with a high membership
degree need not to be the regions with a high rule confidence. This effect that we call
membership is unrobustness [2].
Therefore, the Fuzzy C-Mean clustering (FCM) is utilized to detect the outlier and statistic
equation is used to remove the noisy data in order to improve the quality of the data.
2. Design of method
As shown in Figure 1, the Weka software which was developed at Waikato University [13],
is used for pre-processing the data in the dataset. After cleaning the data, the FCM with
statistic equation (which is described in following section) were utilized to detect outliers,
remove noisy data and extract the desired data to get data of high quality. In the next step,
Type-1 FLS with gradient descent algorithm) were used to make a decision on such data,
after analyzing the data to decide on the parameters to be used, including temperature,
humidity and so on. The important part of this technique is that the gradient descent
Search WWH ::




Custom Search