Information Technology Reference
In-Depth Information
After applying the Kruskal-Wallis test, eighteen variables have been selected:
desc_length , page_height , mouse_distance , mouse_clicks , lb_mouse_clicks , vertic-
al_scroll ,
prod_desc_time ,
prod_review_time ,
prod_recommend_time ,
prod_other_time ,
page_time ,
tab_active_time ,
user_active_time ,
rel_page_time ,
rel_user_active_time ,
rel_prod_desc_time ,
rel_prod_review_time
and
rel_vertical_scroll .
The classification model of product interest was then built with SAS Enterprise
Miner (SAS EM) software. An equal number of cases in each predicted class (rating)
were selected for the purpose of the tree learning procedure. Predictive capabilities of
extracted models were cross-validated with 10-fold cross validation method. The
parameter influencing the depth of the resulting tree - the maximum number of
branches - was set to 3. The minimum leaf size was set experimentally to 20 cases.
The resulting tree had a misclassification rate of 59,2%, and shown a much better
predictive accuracy than a random model with a misclassification rate of 80%. The
confusion matrix (Table 5) shows that the best predictive accuracy is achieved for the
interest ratings of 5, 2 and 1. The tree seems to have a good potential for predicting
user interest, which we confirm below. The most informative variables turned out to
be: verticall_scroll , prod_other_time , page_height , mouse_distance , prod_desc_time
and tab_active_time.
Table 5. Confusion matrix for interest decision tree
Predicted interest
Classification
Accuracy
1
2
3
4
5
54
30
11
11
24
41,5%
1
21
69
15
14
11
53,1%
2
16
40
34
14
26
26,2%
3
18
31
15
38
28
29,2%
4
7
29
15
9
70
53,8%
5
116
199
90
86
159
Sum
It is very popular that recommendation algorithms take as input interest expressed
in binary scale. Because of this fact, we have converted the scale of interest from five-
point nominal to binary. Interest values 1 and 2 have been replaced by 0 (very low or
no interest in product) and values 3, 4 and 5 have been replaced by 1 expressing posi-
tive interest. A decision tree was built anew with SAS EM, setting the minimum leaf
size to 20 and the maximum branches number to 3. The resulting tree was characte-
rized by very good parameters: the Area Under the Curve (AUC) of 0,735, the mis-
classification rate of 31%, the sensitivity of 0,742 and the specificity of 0,636. The
most significant nodes in the resulting tree this time were: vertical_scroll , page_time ,
user_active_time , search_referral and tab_active_time . The last three indicators pro-
posed by us also show a positive correlation with user interest.
Search WWH ::




Custom Search