Introduction - On Statistical Pattern Recognition in Independent Component Analysis Mixture Modelling

Information Technology Reference

In-Depth Information

allowed to ''speak for themselves'' in determining the estimate of f more than

would be the case if f were constrained to fall in a given parametric family.

There are many methods for density estimation such as histograms; naive

estimator; nearest neighbour; variable kernel; orthogonal series; maximum

penalized likelihood; general weight function; and bounded domains and direc-

tional data 0 [ 37 ]. We will use a kernel non-parametric approach for density

estimation in the development of the thesis, so a brief review of the kernel-based

method is included in this section. The kernel estimator is defined as

f ð x Þ¼ 1

x X i

ð 1 : 4 Þ

i ¼ 1

where h is the window width (also called the smoothing parameter or bandwidth),

n is the number of observations, and K is a weight kernel function that satisfies the

condition R 1

1 K ð x Þ dx ¼ 1. f is a probability density that will inherit all the

continuity and differentiability properties of the kernel K, i.e., if K is the normal

density function, then f will be a smooth curve having derivatives of all orders.

Figure 1.4 shows an example of density estimation using the kernel method of

Eq. ( 1.4 ) for univariate data. The estimated density is superimposed on the his-

togram of the data.

A variable kernel is obtained by considering a scale parameter of the ''bumps''

that one placed on the data points, which is allowed to vary from one data point to

another. The variable kernel estimate with smoothing parameter h is defined by

f ð t Þ¼ 1

hd j ; k K

t X j

hd j ; k

ð 1 : 5 Þ

j ¼ 1

where hd j ; k is the distance from X j to the kth nearest point in the set comprising the

other n 1 data points. The window width of the kernel placed on the point X j is

proportional to d j ; k so that the data points in regions where the data are sparse will

have flatter kernels associated with them. For any fixed k, the overall degree of

smoothing will depend on the parameter h. The choice of k determines how

responsive the window width choice will be to very local detail.

The quality of density estimate is evaluated by the closeness of the estimator f

to the true density f . The estimate f depends on the data as well as on the kernel

and the window width; this dependence will not generally be expressed explicitly.

For each x ; f ð x Þ can be thought of as a random variable because of its dependence

on the observations X 1 ... ; X n ; any use of probability, expectation and variance

involving f is with respect to its sampling distribution as a statistic based on these

random observations. The analysis of statistical properties of the kernel estimator

usually considers that the kernel K is a symmetric probability function satisfying

R K ð t Þ dt ¼ 1 ; R tK ð t Þ dt ¼ 0 ; and R t 2 K ð t Þ dt ¼ k 2 6¼ 0 ; and that the unknown

density f has continuous derivatives of all orders required. In the case of a

Gaussian kernel, k 2 will be the variance of the distribution with this density.

On Statistical Pattern Recognition in Independent Component Analysis Mixture Modelling

Search WWH ::

Custom Search

Home