Database Reference
In-Depth Information
of whether to accept or reject a document for each profile. A utility function
is usually used to model user satisfaction and evaluate a system. A general
form of the linear utility function used in the recent Text REtrieval Conference
(TREC) Filtering Track (46) is shown below.
R + + A N ·
N + + B R ·
R + B N ·
N
U = A R ·
(8.3)
This model corresponds to assigning a positive or negative value to each el-
ement in the categories of Table 8.1 , where R , R + , N + ,and N corre-
spond to the number of documents that fall into the corresponding category,
A R ,A N ,B R ,and B N correspond to the credit/penalty for each element in
the category. Usually, A R is positive, and A N is negative. In the TREC-9,
TREC-10, and TREC-11 Filtering Tracks, the following utility function was
used:
T 11 U = T 10 U = T 9 U =2 R +
N +
(8.4)
If we use the T11U utility measure directly and get the final result by
averaging across user profiles, profiles with many delivered documents will
dominate the final result. So a normalized version T11SU was also used in
TREC-11:
T 11 SU = max ( T 11 U
MaxU ,MinNU )
MinNU
(8.5)
MinNU
1
( R + + R ) is the maximum possible utility, 1 and MinNU
where MaxU =2
was set to
0 . 5 in TREC-11. If the score is below MinNU ,the MinNU is
used, which simulates the scenario that the users stop using the system when
the performance is too poor. 2
Notice that in a real scenario, we could define user-specific utility functions
to model user satisfaction and evaluate filtering systems. A better choice of
A R ,A N ,B R ,and B N would depend on the user, the task, and the context.
For example, when a user is reading news with a wireless phone, he may
have less tolerance for non-relevant documents delivered and prefer higher
precision, and thus use a utility function with larger penalty for non-relevant
documents delivered, such as U wireless = R +
3 N + . When a user is doing
research about a certain topic, he may have a high tolerance for non-relevant
documents delivered and prefer high recall, and thus use a utility function with
less penalty for non-relevant documents delivered, such as U research = R +
0 . 5 N + . When monitoring potential terrorist activities, missing information
might be crucial and B R may be a big non-zero negative value.
In addition to the linear utility measure, other measures such as F-beta (46)
defined by van Rijsbergen and DET curves (37) are also used in the research
1 Notice the normalized version does take into consideration undelivered relevant documents.
Therefore, it also provides some information about the recall of the system implicitly.
2 This is not exactly the same, since in TREC the system is evaluated at the very end of
the filtering process.
Search WWH ::




Custom Search