Database Reference
In-Depth Information
8.4.3.1
Cosine Measure
When the angle between the two vectors is a meaningful measure of their
similarity, the normalized inner product may be an appropriate similarity
measure:
x i
·
x j
s ( x i ,x j )=
.
x i ·
x j
8.4.3.2
Pearson Correlation Measure
The normalized Pearson correlation is defined as:
x i ) T
s ( x i ,x j )= ( x i
·
( x j
x j )
,
x i
x i ·
x j
x j
where x i denotes the average feature value of x over all dimensions.
8.4.3.3
Extended Jaccard Measure
The extended Jaccard measure is defined as:
x i
·
x j
s ( x i ,x j )=
.
x i 2 +
x j 2
x i
·
x j
8.4.3.4
Dice Coecient Measure
The dice coecient measure is similar to the extended Jaccard measure
anditisdefinedas:
2 x i
·
x j
s ( x i ,x j )=
.
x i 2 +
x j 2
8.4.4
The OCCT Algorithm
Dror et al . (2014) recently introduced the One-Class Clustering Tree
algorithm (OCCT) which is a clustering tree for implementing One-to-Many
data linkage. Data linkage refers to the task of matching entities from two
different data sources that do not share a common identifier (i.e. a foreign
key). Data linkage is usually performed among entities of the same type. It
is common to divide data linkage into two types, namely, one-to-one and
one-to-many. In one-to-one data linkage, the goal is to associate one record
in the first table with a single matching record in the second table. In the
case of one-to-many data linkage, the goal is to associate one record in
Search WWH ::




Custom Search