Adaptive Affinity Propagation Clustering in MapReduce Environment - Technologies and Applications of Artificial Intelligence

Information Technology Reference

In-Depth Information

decide which points are exemplars and, for every other point, which exemplar it be-

longs to. The “responsibility” , , sent from data point i to candidate exemplar

point k , reflects the accumulated evidence for how well-suited point k is to serve as

the exemplar for point i , taking into account other potential exemplars for point i . The

“availability” , , sent from candidate exemplar point k to point i , reflects the ac-

cumulated evidence for how appropriate it would be for point i to choose point k as its

exemplar, taking into account the support from other points that point k should be an

exemplar. , and , can be viewed as log-probability ratios. To begin with,

the availabilities are initialized to zero: , 0 . Then, the responsibilities are

computed using the rule:

r,,max , , (2)

In the first iteration, because the availabilities are zero, , is set to the input si-

milarity between point i and point k as its exemplar, minus the largest of the similarities

between point i and other candidate exemplars. This competitive update is data-driven

and does not take into account how many other points favor each candidate exemplar. In

later iterations, when some points are effectively assigned to other exemplars, their

availabilities will drop below zero as prescribed by the update rule below. These nega-

tive availabilities will decrease the effective values of some of the input similarities

, in the above rule, removing the corresponding candidate exemplars from com-

petition. For k = i , the responsibility , is set to the input preference that point k be

chosen as an exemplar, , , minus the largest of the similarities between point i and

all other candidate exemplars. This “self-responsibility” reflects the accumulated evi-

dence that point k is an exemplar, based on its input preference tempered by how ill-

suited it is to be assigned to another exemplar.

2.2

Adaptive Affinity Propagation

Rather than requiring that the number of clusters be pre-specified, affinity propagation

takes as input a real number , for each data point k so that data points with larger

values of , are more likely to be chosen as exemplars. These values , are

referred to as “preferences”, which is a kind of the self-similarity. The number of identi-

fied exemplars is influenced by the values of the input preferences. Frey [1] suggested

preference be set as the median of the input similarities ( P m ) without any prior know-

ledge. But in most cases, P m can't lead to optimal clustering solutions. Wang [8]

proposed an AAP algorithm to solve this problem. The AAP searches the space of prefe-

rence values in ∞,P /2 for the optimal value. As the AP tries to maximize the net

similarity, which is a score for explaining the data, and it represents how appropriate the

exemplars are. The score sums up all similarities between data points and their exemplar

(The similarity between exemplar to itself is the preference of the exemplar). The AP

aims at maximizing the net similarity and tests each data point whether it is an exemplar.

Therefore, the method which is using for computing the range of preferences can be

developed.

Search WWH ::

Custom Search

Home