Database Reference
In-Depth Information
Table 14.11
Frequent pattern vs. probabilistic frequent patterns
Let user-specified
minsup
= 1.0 and
minProb
= 0.90
sup
(
{
d
}
)
Set of items
Prob
(
W
j
)
COUNT(
W
j
)
2
(
d
:0.9)
∈
t
2
∧
(
d
:0.5)
∈
t
3
0.45
8192
(
d
:0.9)
∈
t
2
∧
(
d
:0.5)
∈
t
3
0.45
8192
1
(
d
:0.9)
∈
t
2
∧
(
d
:0.5)
∈
t
3
+
0
.
05
+
8192
=
=
0
.
50
16384
∈
t
2
∧
∈
t
3
0
(
d
:0.9)
(
d
:0.5)
0.05
8192
j
Prob
(
W
j
)
j
COUNT (
W
j
)
=
1
=
32768
As
expSup
(
{
d
}
,
D
2
)
=
(2
×
0
.
45)
+
(1
×
0
.
50)
+
(0
×
0
.
05)
=
0
.
9
+
0
.
5
=
1
.
4
≥
minsup
,
{
d
}
is
an expected support-based
frequent pattern
.
As
Prob
(
sup
(
{
d
}
,
D
2
)
≥
minsup
)
=
0
.
45
+
0
.
50
=
0
.
95
≥
minProb
,
{
d
}
is also a
probabilistic
frequent pattern
.
sup
(
{
e
}
)
Set of items
Prob
(
W
j
)
COUNT(
W
j
)
2
(
e
:0.7)
∈
t
3
∧
(
e
:0.3)
∈
t
4
0.21
8192
(
e
:0.7)
∈
t
3
∧
(
e
:0.3)
∈
t
4
0.49
8192
1
(
e
:0.7)
∈
t
3
∧
(
e
:0.3)
∈
t
4
+
0
.
09
+
8192
=
0
.
58
=
16384
0
(
e
:0.7)
∈
t
3
∧
(
e
:0.3)
∈
t
4
0.21
8192
j
Prob
(
W
j
)
j
COUNT (
W
j
)
32768
As
expSup
(
{
e
}
,
D
2
)
=
(2
×
0
.
21)
+
(1
×
0
.
58)
+
(0
×
0
.
21)
=
0
.
7
+
0
.
3
=
1
.
0
≥
minsup
,
{
e
}
is
an expected support-based
frequent pattern
.
However, as
Prob
(
sup
(
{
e
}
,
D
2
)
=
1
=
≥
=
0
.
21
+
0
.
58
=
{
e
}
is
not
a
minsup
)
0
.
79
< minProb
,
probabilistic frequent pattern.
function (pmf). A pattern
X
is highly likely to be frequent (i.e.,
X
is a probabilistic
frequent pattern) if and only if its frequentness probability is no less than
minProb
,
i.e.,
P
(
sup
(
X
,
D
)
minProb
. The frequentness probability of
X
is the
probability that
X
occurs in at least
minsup
transactions of
D
. (Table
14.11
)
Note that frequentness probability is anti-monotonic: All subsets of a p-FP are
also p-FPs. Equivalently, if
X
is not a p-FP, then none of its supersets is a p-FP,
and thus all of them can be pruned. Moreover, when
minsup
increases, frequentness
probabilities of p-FPs decrease.
Bernecker et al. [
15
] used a dynamic computation technique in computing the
probability function
f
X
(
k
)
≥
minsup
)
≥
k
), which returns the probability that
the support of a pattern
X
equals to
k
. Summing the values of such a probability
function
f
X
(
k
) over all
k
≥
=
P
(
sup
(
X
,
D
)
=
minsup
gives the frequentness probability of
X
because
|
D
|
|
D
|
f
X
(
k
)
=
P
(
sup
(
X
,
D
)
≥
minsup
)
.
k
≥
minsup
k
≥
minsup
Any pattern
X
having the sum no less than
minProb
becomes a p-FP.
Sun et al. [
49
] proposed the top-down inheritance of support probability func-
tion (TODIS) algorithm, which runs in conjunction with a divide-and-conquer (DC)
approach, to mine probabilistic frequent patterns from uncertain data by extracting