Database Reference
In-Depth Information
is presumed for a given distribution P [6.5]. In other words, it is assumed that
there exists a Bayesian network, G, that captures all the conditional indepen-
dence relations implied by P . Consequently, this suggests a general learning
methodology: Construct a network G by testing the validity of any inde-
pendence assertions I(X, Z, Y ). In practice, we can use what is collectively
called the conditional independence test (CI test). If the statement I(X, Z, Y )
is supported by the data, it follows that X should be d-separated [6.10] with
Y by Z in G; otherwise, X is not d-separated with Y by Z.
As a digression from the ongoing discussion, we give a brief description
of the CI test. A common approach is to use a hypothesis testing proce-
dure discussed in the statistical literature [6.5], [6.12], [6.13]. To begin, the
conditional independence assertion (i.e., I(X, Z, Y )) is modeled as the null
hypothesis. Suppose we use the likelihood-ratio χ 2
test; the χ 2
statistics is
calculated by:
2 observed
G 2 =
log(expected/observed).
(6.3)
Simply put, the statistic calculate the discrepancies between the real oc-
currence, observed, and the expected count followed from the hypothesis,
expected, over every distinct event. In our case, because I(X, Z, Y ) implies
P (x, y, z)=P (x
|
y, z) P (y, z)
= P (x
|
y) P (y, z)
( y . . ),
the statistics are computed by:
2
x,y,z
P (x, y, z)
P (y, z)P (x
G 2 =
P (x, y, z)log
z) .
(6.4)
|
Suppose the number of possible instantiations of the variables X, Y ,andZ
are, respectively v X , v Y ,andv Z ; G 2
follows a χ 2
distribution with (v X
v Z degrees of freedom. Checking our computed G 2 against the
distribution, we obtain the p-value, which is “the smallest level of significance
for which the data lead to the rejection of the null hypothesis” [6.14]. If the
p-value is less than a predefined cutoff value α, the test shows strong evidence
to reject the hypothesis; otherwise, the hypothesis cannot be rejected.
Take the SGS algorithm [6.5] as an illustration. The algorithm begins
with a completely connected undirected graph. In other words, dependence
between every pair of variables is assumed. Then CI tests between all pairs
of connected nodes are conducted. When two nodes X and Y are found to
be conditionally independent given Z, the undirected edge between them is
removed so that I(X, Z, Y ) is not violated. When no more edges can be re-
moved, the undirected edges in the graph are oriented according to some rules
that conform to the conditional independence relations discovered previously.
This produces the final Bayesian network.
In general, there are three problems typical to the dependency analysis
approach. First, it is di cult to determine whether two nodes are dependent.
1)
×
(v Y
1)
×
Search WWH ::




Custom Search