Increasing the Accuracy of Software Fault Prediction Using Majority Ranking Fuzzy Clustering* - Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing

Information Technology Reference

In-Depth Information

efficiency and product quality [19]. Several parameters such as connectivity, inten-

sively and distance among data characteristics determine the level of similarity.

Usually in clustering methods data element belongs to exactly one cluster, which is

famous as hard clustering, however, among them, a soft clustering method that is

called fuzzy clustering calculates the relativity of each module ( X = x 1 , x 2 , …, x n ) to

the specified clusters ( C = c 1 , c 2 , …, c c ) with membership values ( M = m 1 , m 2 , …,

m n ) varies from zero to one. In this method, data elements belong to one or more

clusters at the same time. The C-means clustering is one of the most important

fuzzy clustering techniques developed in 1973 [25] and improved in 1981 [26].

Variety of different application has used this method to solve their problems. In this

method, the final aim is to minimize a target function as shown in Eq. 1.

= ∑∑ −

, 1≤<∞ (1)

u ij is the membership degree of x i from the center of cluster j ( c j ), and || − || is the

difference expressing the similarity between data ( x i ) and the center of cluster j ( c j ).

3.1 C-means Clustering Algorithm

In C-means clustering, first a set of random initial membership values (U (0) = u ij ) are

generated from each data module x i for each cluster c j . Then center vector of each

cluster is calculated based on Eq. 2 for k number of times. After that u (k) and u (k+1) is

updated according to Eq. 3. And finally if difference between u (k) and u (k+1) is less

that the threshold, the iteration stops, otherwise, new cluster's centers are employed

based on Eq. 2

= ∑ .

∑

, () = ℎ () (2)

=

(3)

∑

4

Proposed Method

Clustering of the software projects is the key part of estimation method proposed in

this section. To overcome the diversity and inconsistency of the projects collected

in a dataset, it is required to separate the outliers and irrelevant projects from other

ones. The modules clustering can increase the consistency of modules by putting

similar modules in the same clusters. Instead of having a dataset, which includes

numerous irrelevant and inconsistent modules, there will be several subsets

comprising of consistent and similar modules. Clustering process is performed by

analyzing the modules features to discriminate the most similar modules and

Search WWH ::

Custom Search

Home