A Transfer-Learning Approach to Exploit Noisy Information for Classification and Its Application on Sentiment Detection - Technologies and Applications of Artificial Intelligence

Information Technology Reference

In-Depth Information

where denotes features with instances in the low-quality domain that have high-

quality mapping, and denotes features with instances in the high-quality domain

that have low-quality mapping.

Finally, we create a new feature space, which is twice in length comparing to the

original feature space, for the processed instances. The instances are processed in

three different ways. 1) For instances appear in both low-quality and high-quality

domains, we concatenate the corresponding low-quality features with the original

high-quality features. 2) For instances that only appear in high-quality domains, we

simply copy the features and concatenate them to the end. 3) For instances that only

appear in low-quality domain, we first generate the corresponding mapping to the

high-quality domain, and then treat it like case 2.

3.4

Instance Weight Tuning

We are now ready to exploit the instances from both domains to train a classifier.

However, it is apparent that the instances from high-quality and low-quality domains

should not be treated equally during training. Here we propose a method to adjust the

initial weights on each instance according to the following heuristics.

•

Instances in the high-quality domain should have higher weights. Furthermore, if

the corresponding low-quality instances also contain identical label, the weight is

even higher.

•

For instances in the low-quality domain that can be mapped to high-quality domain

with the same labels, their weights should be greater than the weights of the in-

stances that cannot be mapped to high-quality domain.

We order the instances based on the above heuristics, and assign initial weight as

W W ʱ where ʱ <1, W and W stands for instances of order i and i-1 . W

represents the set of weights to the instances. After setting initial instance weights, we

apply TrAdaBoost [3] to tune the weights iteratively. The intuition of TrAdaBoost is

to use different weight-updating function for different domain data. More specifically,

we increase the weight more if the instance is predicted incorrectly in high quality

domain. The assumption of this setting is that the data in low-quality domain does not

have as high confidence score as those in high-quality domain. The formulas of TrA-

daBoost to update the instance weights are as follows:

| | , in high-quality domain

| | , in low-quality domain

where ʲ and ʲ are multiplier calculated by error rates and traditional AdaBoost.

Search WWH ::

Custom Search

Home