Database Reference
In-Depth Information
And for Y
={
A
}
:
1
8
P
(
Y
|
D
,
CT
)
=
log
1
8
|
=−
=
L
(
code
(
Y
CT
))
3
.
00
So,
{
A
,
B
,
C
}
is assigned a code of length 0.68 bits, while
{
A
,
B
}
,
{
A
}
and
{
B
}
are
assigned codes of length 3 bits each.
Now, for any database
D
and code table
CT
over the same set of patterns
X
we
can compute
L
(
D
|
CT
) according to the following trivial lemma.
Lemma 8.9
Let
D
be a database, CT be a code table over
X
and code-optimal
D
for
,
cover
a cover function, and
usage
the usage function for
cover
.
1. For any t
∈
D
its encoded length, in bits, denoted by L
(
t
|
CT
)
,is
L
(
t
|
CT
)
=
L
(
code
(
X
|
CT
))
.
X
∈
cover
(
CT
,
t
)
2. The encoded size of
D
, in bits, when encoded by CT , denoted by L
(
D
|
CT
)
,is
L
(
D
|
CT
)
=
L
(
t
|
CT
)
.
t
∈
D
With Lemma 8.9, we can compute
L
(
D
|
M
), but we also need to know what
L
(
M
)
is, i.e., the encoded size of a code table.
Recall that a code table is a two-column table consisting of patterns and codes.
As we know the size of each of the codes, the encoded size of the second column is
easily determined: it is simply the sum of the lengths of the codes. The encoding of
the first column, containing the patterns, depends on the pattern type; a lossless and
succinct encoding should be chosen.
Definition 8.10
Let
D
be a database, CT a code table over
X
that is code-optimal
for
D
, and encode an encoding for elements of
X
. The size of CT in bits, denoted
by L
(
CT
|
D
)
, is given by
X
∈
CT
:
usage
D
(
X
)
=
0
|
L
(
CT
|
D
)
=
encode
(
X
)
|+|
code
(
X
|
CT
)
|
.
Note that we do not take patterns with zero usage into account, because they are not
used to code and do not get a finite code length.
With these results we have the total size of the encoded database.
D
T
Definition 8.11
Let
be a database with tuples drawn from
, let CT be a code
D
table that is code-optimal for
and
cover
a cover function. The
total compressed
size
of the encoded database and the code table, in bits, denoted by L
(
D
,
CT
)
is
given by
L
(
D
,
CT
)
=
L
(
D
|
CT
)
+
L
(
CT
|
D
)
.