Information Technology Reference
In-Depth Information
As a comparison with our approach, we used different other approaches which
we will briefly introduce.
All Returns/All Kept: As most of the purchases in the dataset are returns, we used
one dummy algorithm to predict that the purchase was returned. This serves as a
baseline for the accuracy measure. For the precision measure, we used the opposite
logic, marking all as kept purchases.
KNN: K-Nearest Neighbor is a classification algorithm using K examples of the
training set most similar to the item to classify. Classification is done by majority
vote of its neighbors. In our evaluation, we have chosen k
=
5 neighbors [ 2 ].
Naïve Bayes: A probabilistic classifier based on the Bayes Theorem [ 39 ]. Naïve
Bayes classifier are, for instance, used in text categorization to classify items into
spam or non-spam.
Ensemble: Ensembles are a combination of different approaches with an extra
algorithm combining the results from different algorithms. In this evaluation, we
used Stacking [ 49 ] as the combination methodology and combined the previously
mentioned approaches with our CBR algorithm. Stacking was used by the two top
teams of the Netflix-Challenge [ 4 , 6 ]. Stacking uses an extra learning algorithm,
in our case Logistic Regression, to make a final prediction out of the results of the
other approaches.
The evaluation was done on a subset of the dataset described in Sect. 8.4.1 .We
deleted all users with less then 20 items in their purchase history. This was done as the
evaluation was conducted using a train/test split of the users purchase history. With
less than 20 items, not enough data for training the algorithms and test them would
have been available. We also removed some of the features, as they contained clear
indicators of an item that was chosen or not. Out of the 263 features, we removed
13, so that we ended up with 250 features per item.
For the results, we conducted five test runs. In each test run, we iterated over all
users (3,747 users in the evaluation data), extracting all user data from the dataset,
splitted the user data into 80% training set and 20% test set randomly, and then
integrated the users' training data back into the data set. As a result, we got a dataset
consisting of all data except the separated test data of one user. This test set was then
used to evaluate the results of the different approaches. This evaluation approach
was chosen, as we compare different types of algorithms. Our CBR only takes into
account the data of one user, thus it needs only the 20
items of the user to evaluate.
Other approaches, like Naïve Bayes learn their model on the complete dataset, to
find discriminators to make the prediction. As we want to see how the different
approaches work in a personalized setting, recommendations for one user, we chose
the previously described data splitting. The results of the five test runs were then
averaged. Figure 8.9 shows the result based on precision.
Figure 8.10 shows the result based on accuracy. We see that the performance
differs compared to the precision performance.
If we look at the precision setting, we see that theNaïve Bayes approach performed
poorly when compared to the baseline and the other approaches. Our approach, CBR,
performs good compared to the baseline. As we only use data from one user, building
+
Search WWH ::




Custom Search