Learning-Oriented Question Recommendation Using Bloom’s Learning Taxonomy and Variable Length Hidden Markov Models - Transactions on Large-Scale Data-and Knowledge-Centered Systems - page 39

Information Technology Reference

In-Depth Information

Table 3. Statistics of the collected sequences

Data set

No. of sequences No. of distinct questions

%

Earth sciences

61

90

28.75

Nutrition

46

94

29.56

Homeschooling

56

84

43.98

3.3 Evaluation Metrics

Evaluating a recommender system on its prediction power is crucial, but insuf-

ficient in order to deploy a good recommendation engine [ 17 ]. There are other

measures that reflect various aspects. However, not all of them are desired to

perform well for every recommender.

Therefore, the evaluation of the LoR model should not be based on prediction

performance (accuracy and average log-loss) alone, but also on other metrics that

capture various desired aspects of a learning-oriented recommender within a QA

system. Let us briefly define these metrics.

Catalog Coverage. In general, catalog coverage represents the proportion of

questions that the recommendation model can recommend. In our case, we define

the catalog coverage as the proportion of questions that the model P can rec-

ommend with a prediction value higher than a predefined threshold ˃ .

Overall, all three recommender models introduced in Section 2 can gener-

ate recommendations for any user (i.e. full user space coverage) and, eventually,

all questions can be recommended, since the recommender repeatedly excludes

already visited ones. But, towards the exhaustion of the database, the recom-

mendations will have a very low prediction value. These recommendations are

unreliable. Therefore, we introduce the prediction threshold ˃ .

In our evaluation, we generally set ˃ to be the lowest prediction value among

the questions within the sequences used for training. Since the user space cover-

age is equal for all recommender models, we will further refer to catalog coverage

simply as “coverage”.

Diversity. Generally, diversity is defined as the opposite of similarity. Within

this context, we define the diversity as the average dissimilarity among each

question pair within a recommendation.

Let s be a question sequence context. Then, the diversity of R ( s ) is defined

as

2

div ( R ( s )) =

[1

−

sim q ( q i ,q j )] ,

(13)

N

·

( N

−

1)

( q i ,q j ) ∈ R ( s )

i<j

where sim q :

Q×Q ₒ

[0 , 1] represents the semantic similarity measure between

questions.

During the evaluation, we used the simple cosine similarity together with

the semantic concept similarity defined by Lin [ 14 ]. In order to avoid further

Next Page

Transactions on Large-Scale Data-and Knowledge-Centered Systems

Search WWH ::

Custom Search

Home