The Classification of Noisy Sequences Generated by Similar HMMs (Pattern Recognition and Machine Learning)

Abstract

The method for classification performance improvement using hidden Markov models (HMM) is proposed. The k-nearest neighbors (kNN) classifier is used in the feature space produced by these HMM. Only the similar models with the noisy original sequences assumption are discussed. The research results on simulated data for two-class classification problem are presented.

Keywords: Hidden Markov Model, Derivation, k Nearest Neighbors.

Introduction

HMM are a powerful tool for modeling of various processes and pattern recognition. By their nature, Markov models allow to deal with spatial-temporal sequence characteristics directly and therefore they became widely-used [1], [2], [3]. However, despite a wide circulation of models of this kind, HMM possess low enough classification abilities. Though these models are widespread HMM possess of sufficiently low classification properties. At the same time it is well known that the HMM have a fairly low classification ability.We understand the classification problem by the following. We have an object set discriminated on classes by expert (training with teacher). This object set is named the training set. We want to create an algorithm that will be able to classify a test object from origin set. The two-class classification problem using the matrix of distances between the objects is discussed in this article. In this case, the each object is being described by distances to all other objects from training set. The method of the nearest neighbors, the Parzen windows method and the method of potential functions work with input data of such type. The set of Gaussian time sequences generated by two HMMs with similar parameters are taken as classification objects. In order to approximate real-world examples all observed time sequences being distorted. The task was to compare the capabilities of traditional methods of classification with a simple nearest neighbor classifier [4] in the space of first derivatives of the likelihood function for the HMM parameters.


The remainder of this article is organized as follows. Section 2 introduces the method of solution of assigned task. Section 3 provides some results of computational experiments. Section 4 summarizes our findings.

The Method of Sequences Classification

An HMM is completely described by the following parameters:

1. The initial state distributiontmp1411-132_thumbwheretmp1411-133_thumband N – is the number of hidden states in the model.

2. The matrix of state transition probabilitiestmp1411-134_thumb wheretmp1411-135_thumbwhere T – is the length of observable sequence.

3. The matrix of observation symbols probabilitiestmp1411-136_thumb tmp1411-137_thumbwhere _ _tmp1411-138_thumb_ _ is the symbol observed at the moment of timetmp1411-139_thumbis the number of observation states in the model. In this work the case when function observable symbol probabilitie distribution is described by a mix of normal distributions is considered in such a manner that the one hidden state is associated with one observable state:tmp1411-140_thumb

Thus, HMM is completely described by the matrix of state transition probabilities, the probabilities of observation symbols and the initial state distribution:

tmp1411-141_thumb

A classifier based on log-likelihood function is traditionally used. The sequence O is considered as being generated by modeltmp1411-142_thumbif (1) is satisfied:

tmp1411-143_thumb

Otherwise, it is considered that the sequence is generated by model A2.

Some authors (e.g. [5],[6]) propose to use the spaces of the so-called secondary features as a feature space in which the sequences are being classified. For example, forward-probabilities and backward-probabilities, which are used for computation of probability that the sequence is generated by model A, can be used as a secondary features. The first derivatives of space of likelihood function logarithm are also used. These derivatives are being taken with respect to different model parameters. The authors [5] offer to include the original sequence into the feature vector also. In this work the performance of two-class classification in the space of the first derivatives of the likelihood function is discussed.

The classification problem states as follows. There are two groups of training sequences: the first group consists of sequences generated by Ai, and the second group – by A2. Usually, in order to determine which class the test sequence Otest is belongs to the rule (1) is used. Because the model parameters A1 and A2 are unknown, at first one needs to estimate them (for example, the algorithm of Baum-Welch is used for it), and then calculate them according to rule (1).

If the competing models have similar parameters, and the observed sequences are not purely Gaussian sequence, the traditional classification technique using (1) does not always give acceptable results.

The following schema that increases discriminating features of HMM.

Step 1. For each training sequencetmp1411-144_thumbwhere Ki – the count of training sequences for class with number l, the characteristic vector tmp1411-157_thumb  which can consist of all or a part of the features is being formed. The likelihood function is being calculated as for true class model to which training vectors are belong to as for model of other class. As a resultthe characteristic vector for the training sequencetmp1411-158_thumbgenerated by modeltmp1411-159_thumbconsists of two subvectors:tmp1411-160_thumb, where the first subvector

consists of features, initiated by the modeltmp1411-161_thumb, and the second – by the modeltmp1411-162_thumb.

Step 2. Similarly, the characteristic vector is calculated for the test sequence

Qtest

Step 3. Using a metric based classifier (e.g. kNN) it is become clear to which class Otest belongs to.

tmp1411-163_thumbtmp1411-164_thumbtmp1411-165_thumbtmp1411-166_thumbtmp1411-167_thumb

Computing Experiments

Investigations were performed under following assumptions. The modelstmp1411-168_thumband tmp1411-169_thumbare defined on the hidden Markov chains with identical and they have differences in the matrix of transition probabilities only. For the first modeltmp1411-170_thumband for the second modeltmp1411-171_thumbthe difference was in transitive probabilities only:

tmp1411-176_thumb

The Gaussian distribution parameters for the modelstmp1411-177_thumbare chosen identical:tmp1411-178_thumbThe probabilities of initial

states are coincide also:tmp1411-179_thumb. Thus, models have turned out very close to each other, and, hence, sequences differ among themselves very weakly.

The training and testing sequences have been simulated by the Monte-Carlo method. It has been generated 5 training sets of 100 sequences for each of the classes to perform investigations. For each training set 500 test signals were generated for each class. The results of classification have been averaged. The length of the each sequence has been set to 100.

The Additive Noise

The first variant of distortion of a true sequence assumed additive superposition of the noise component distributed under some distribution law. We denoted the sequence simulated on model A through u. Then at superposition of noise e on this sequence according to the following formula we received noisy sequence with an additive noise:tmp1411-180_thumb, where w shows the influence of sequence distortion.

The space of the first derivatives of likelihood function with respect to elements of transition probabilities matrix has been chosen as the feature space for the kNN classifier. Further in Tables 1-3 following designations are used: APD – the average percent of difference between the results of the kNN classifier and the results of traditional classification; AP – the average percent of correctly classified sequences using the kNN classifier.

Table 1. The comparison of classification’s results at the additive noise

tmp1411-185 tmp1411-186 tmp1411-187

LO

APD

AP

APD

AP

0.1

-1.5

89.12

11.18

88.56

0.2

-1.48

88.98

7.86

84.86

0.3

-2.14

85.82

22.62

84.26

0.4

-1.64

80.58

2.4

74.14

0.5

-0.52

72.28

-1.88

72.44

0.6

-4.28

58.98

-3.44

68.08

0.7

-2.66

53.18

-3.08

67.18

0.8

-2.82

49.26

-0.12

49.76

0.9

-1.44

49.88

0.92

50.52

The classification results with the normal noise distribution are shown in Table 1 in the 2nd and the 3rd columns. As follows from this experiment, the kNN classifier using the space of the first derivatives of likelihood function gives worse results than traditional classifier. It is explained by the fact that the noises and sequences have identical normal distribution, and the algorithm of Baum-Welch being used for parameters estimation is exactly tuned for parameters estimation of probability distribution function of observed sequences in the conditions when this function is the normal distribution function. The results of classification comparison at the noise distributed on the Cauchy distribution law are in Table 1 in the 4th and the 5th columns. In this case there is opposite situation: the kNN classifier gives better results at the noise leveltmp1411-188_thumb

Probability Substitution of a Sequence by Noise

The second variant of distortion of a true sequence assumed partial substitution of a sequence by noise under the probability scheme, i.e. with some probability p instead of the true sequence associated with some hidden state, the noise sequence was appearing. At this time the parameters of noisy sequence were varying in the different experiments.

The results of classification comparison at the noise distributed on the normal distribution law are in Table 2 from the 2nd to the 5rd columns. In the 2nd and the 3rd columns as a result of superposition of such noise there was a displacement of the estimated parameters of expectation, but the traditional classifier showed better results than the proposed one. In the next two columns there is stable classification results improvement when the probability of noise appearance p < 0.6. It is explained by the fact that the parameters of distribution of noise generator are very big values unlike the previous case. The results of classification comparison when the noise has a Cauchy distribution are shown in Table 2 in the 6th and the 7th columns. In this case there is stable advantage of the kNN classifier based classifier.

Table 2. The comparison of classification’s results at the probability substitution of a sequence by noise that hasn’t dependence on the hidden state

tmp1411-190 tmp1411-191 tmp1411-192 tmp1411-193

p

APD

AP

APD

AP

APD

AP

0.1

-2.44

81.88

21.36

78.28

12.32

82.1

0.2

-2.96

78.38

3.96

52.88

17.4

76.76

0.3

-4.6

71.04

3.74

51.99

21.4

73.68

0.4

-2.62

69.84

6.46

52.61

6.42

61.32

0.5

-2.48

65.32

1.32

51.62

3.42

54.2

0.6

-2.45

59.81

2.84

52.72

2.58

52.22

0.7

-0.8

55.54

-1.3

50.85

0.71

50.49

0.8

-2.54

52.46

1.76

50.14

1.66

51.64

0.9

-0.1

50.1

0.04

50.24

1.09

50.83

Table 3. The comparison of classification’s results at the probability substitution of a sequence by noise that has dependence from the hidden state

tmp1411-194 tmp1411-195 tmp1411-196

LO

APD

AP

APD

AP

0.1

-0.58

79.1

11.46

84.06

0.2

2.78

70.54

17.08

80.42

0.3

4.3

65.52

13.66

63.04

0.4

3.78

57.2

3.17

53.95

0.5

3.8

56.94

1.74

52.2

0.6

2.96

57.08

0.81

51.11

0.7

0.12

60.94

0.14

50.18

0.8

-0.6

70.04

0.85

51.53

0.9

-2.27

77.35

1.36

50.98

Classification results with the normal noise distribution:tmp1411-197_thumb,

tmp1411-198_thumbit is the noise appearing when the HMM is in the ith hidden state) are shown in Table 3 in the 2nd and the 3rd columns. The observed sequences have the double-mode distribution instead of single-mode distribution expected attmp1411-199_thumb. The traditional classifier is slightly worse than the proposed kNN classifier in this situation. The results of classification comparison at the noise distributed on the Cauchy distribution law:tmp1411-200_thumbare in Table 3 in the 4th and the 5th

columns. In this case the noise substituting the original sequences is differed from the last by using distribution law of random variables only. In this experiment it is observed the constant advantage of the offered method that used the kNN  classifier. Similar results were obtained in the noise parameters distributed by Cauchy distribution law:tmp1411-205_thumb, i.e. with the absence of noise displacement but with different scale of distribution.

Conclusion

In this article it was shown that the feature space generated by the HMM can be used for the classification of sequences generated by the similar models. The first derivatives of likelihood function logarithm with respect to the parameters of HMM were used as the features. The kNN classifier was used in this feature space. Studies have shown that with similar models and signals with the distortion the proposed method can improve the quality of classification.

Next post:

Previous post: