Information Technology Reference
In-Depth Information
(a) (b)
Fig. 3.10. Helmholtz machine, proposed by Dayan et al. [50] for discovering hierarchical
structure in data: (a) sketch of the architecture (image adapted from [50]); (b) illustration of
the bars problem.
X
X
[ wake ] q j = σ (
[ sleep ] .
s i φ ij + φ 0 j );
p j = σ (
s k θ kj + θ 0 j )
i
k
To estimate network parameters, Hinton et al. [98] proposed the wake-sleep algo-
rithm. The recognition weights φ are trained in the sleep mode to reproduce the
higher-level representations from the generated fantasies. Symmetrically, the gen-
erative weights θ are trained during the wake phase to produce fantasies from the
higher-level representations that match the current inputs:
[ wake ] ∆θ kj = s k ( s j p j );
[ sleep ] .
∆φ ij = s i ( s j q j )
Frey et al. [75] showed that this algorithm is able to discover hierarchical struc-
ture from data. They used the bars problem, illustrated in Figure 3.10(b). Data is
generated as follows. First, it is decided randomly if the vertical or the horizontal
orientation is used. Next, the lines or the columns of a 16 × 16 image are turned on
with P = 0 . 25 , depending on the chosen orientation. Finally, individual pixels are
turned on with a probability of P = 0 . 25 . The authors used a three-layer network
with 36 units in the middle layer and 4 units in the top layer. To enforce a solu-
tion where individual bars are added to the image, the middle-to-bottom weights
were constrained to be non-negative. The generative biases of the middle units were
initialized to 4 to facilitate a sparse representation.
After running the wake-sleep algorithm, the generative weights of 32 middle
units resembled vertical or horizontal bars. Furthermore, one of the top units indi-
cated the orientation by exciting the vertical bar units and inhibiting the horizontal
bar units in the middle layer. Hence, the network discovered the data generation
mechanism. However, if the non-negativity constraint was not used and the bias
was initialized to zero, the network did not find the optimal solution and modeled
the data in a more complicated way.
Search WWH ::




Custom Search