Information Technology Reference
In-Depth Information
decide which points are exemplars and, for every other point, which exemplar it be-
longs to. The “responsibility” , , sent from data point i to candidate exemplar
point k , reflects the accumulated evidence for how well-suited point k is to serve as
the exemplar for point i , taking into account other potential exemplars for point i . The
“availability” , , sent from candidate exemplar point k to point i , reflects the ac-
cumulated evidence for how appropriate it would be for point i to choose point k as its
exemplar, taking into account the support from other points that point k should be an
exemplar. , and , can be viewed as log-probability ratios. To begin with,
the availabilities are initialized to zero: , 0 . Then, the responsibilities are
computed using the rule:
r,,max , , (2)
In the first iteration, because the availabilities are zero, , is set to the input si-
milarity between point i and point k as its exemplar, minus the largest of the similarities
between point i and other candidate exemplars. This competitive update is data-driven
and does not take into account how many other points favor each candidate exemplar. In
later iterations, when some points are effectively assigned to other exemplars, their
availabilities will drop below zero as prescribed by the update rule below. These nega-
tive availabilities will decrease the effective values of some of the input similarities
, in the above rule, removing the corresponding candidate exemplars from com-
petition. For k = i , the responsibility , is set to the input preference that point k be
chosen as an exemplar, , , minus the largest of the similarities between point i and
all other candidate exemplars. This “self-responsibility” reflects the accumulated evi-
dence that point k is an exemplar, based on its input preference tempered by how ill-
suited it is to be assigned to another exemplar.
2.2
Adaptive Affinity Propagation
Rather than requiring that the number of clusters be pre-specified, affinity propagation
takes as input a real number , for each data point k so that data points with larger
values of , are more likely to be chosen as exemplars. These values , are
referred to as “preferences”, which is a kind of the self-similarity. The number of identi-
fied exemplars is influenced by the values of the input preferences. Frey [1] suggested
preference be set as the median of the input similarities ( P m ) without any prior know-
ledge. But in most cases, P m can't lead to optimal clustering solutions. Wang [8]
proposed an AAP algorithm to solve this problem. The AAP searches the space of prefe-
rence values in ∞,P /2 for the optimal value. As the AP tries to maximize the net
similarity, which is a score for explaining the data, and it represents how appropriate the
exemplars are. The score sums up all similarities between data points and their exemplar
(The similarity between exemplar to itself is the preference of the exemplar). The AP
aims at maximizing the net similarity and tests each data point whether it is an exemplar.
Therefore, the method which is using for computing the range of preferences can be
developed.
Search WWH ::




Custom Search