Information Technology Reference
In-Depth Information
where denotes features with instances in the low-quality domain that have high-
quality mapping, and denotes features with instances in the high-quality domain
that have low-quality mapping.
Finally, we create a new feature space, which is twice in length comparing to the
original feature space, for the processed instances. The instances are processed in
three different ways. 1) For instances appear in both low-quality and high-quality
domains, we concatenate the corresponding low-quality features with the original
high-quality features. 2) For instances that only appear in high-quality domains, we
simply copy the features and concatenate them to the end. 3) For instances that only
appear in low-quality domain, we first generate the corresponding mapping to the
high-quality domain, and then treat it like case 2.
3.4
Instance Weight Tuning
We are now ready to exploit the instances from both domains to train a classifier.
However, it is apparent that the instances from high-quality and low-quality domains
should not be treated equally during training. Here we propose a method to adjust the
initial weights on each instance according to the following heuristics.
Instances in the high-quality domain should have higher weights. Furthermore, if
the corresponding low-quality instances also contain identical label, the weight is
even higher.
For instances in the low-quality domain that can be mapped to high-quality domain
with the same labels, their weights should be greater than the weights of the in-
stances that cannot be mapped to high-quality domain.
We order the instances based on the above heuristics, and assign initial weight as
W W ʱ where ʱ <1, W and W stands for instances of order i and i-1 . W
represents the set of weights to the instances. After setting initial instance weights, we
apply TrAdaBoost [3] to tune the weights iteratively. The intuition of TrAdaBoost is
to use different weight-updating function for different domain data. More specifically,
we increase the weight more if the instance is predicted incorrectly in high quality
domain. The assumption of this setting is that the data in low-quality domain does not
have as high confidence score as those in high-quality domain. The formulas of TrA-
daBoost to update the instance weights are as follows:
| | , in high-quality domain
| | , in low-quality domain
where ʲ and ʲ are multiplier calculated by error rates and traditional AdaBoost.
Search WWH ::




Custom Search