Database Reference
In-Depth Information
of whether to accept or reject a document for each profile. A
utility function
is usually used to model user satisfaction and evaluate a system. A general
form of the linear utility function used in the recent Text REtrieval Conference
(TREC) Filtering Track (46) is shown below.
R
+
+
A
N
·
N
+
+
B
R
·
R
−
+
B
N
·
N
−
U
=
A
R
·
(8.3)
This model corresponds to assigning a positive or negative value to each el-
spond to the number of documents that fall into the corresponding category,
A
R
,A
N
,B
R
,and
B
N
correspond to the credit/penalty for each element in
the category. Usually,
A
R
is positive, and
A
N
is negative. In the TREC-9,
TREC-10, and TREC-11 Filtering Tracks, the following utility function was
used:
T
11
U
=
T
10
U
=
T
9
U
=2
R
+
N
+
(8.4)
If we use the T11U utility measure directly and get the final result by
averaging across user profiles, profiles with many delivered documents will
dominate the final result. So a normalized version T11SU was also used in
TREC-11:
−
T
11
SU
=
max
(
T
11
U
MaxU
,MinNU
)
−
MinNU
(8.5)
−
MinNU
1
(
R
+
+
R
−
) is the maximum possible utility,
1
and
MinNU
where
MaxU
=2
∗
was set to
0
.
5 in TREC-11. If the score is below
MinNU
,the
MinNU
is
used, which simulates the scenario that the users stop using the system when
the performance is too poor.
2
Notice that in a real scenario, we could define user-specific utility functions
to model user satisfaction and evaluate filtering systems. A better choice of
A
R
,A
N
,B
R
,and
B
N
would depend on the user, the task, and the context.
For example, when a user is reading news with a wireless phone, he may
have less tolerance for non-relevant documents delivered and prefer higher
precision, and thus use a utility function with larger penalty for non-relevant
documents delivered, such as
U
wireless
=
R
+
−
3
N
+
. When a user is doing
research about a certain topic, he may have a high tolerance for non-relevant
documents delivered and prefer high recall, and thus use a utility function with
less penalty for non-relevant documents delivered, such as
U
research
=
R
+
−
−
0
.
5
N
+
. When monitoring potential terrorist activities, missing information
might be crucial and
B
R
may be a big non-zero negative value.
In addition to the linear utility measure, other measures such as F-beta (46)
defined by van Rijsbergen and DET curves (37) are also used in the research
1
Notice the normalized version does take into consideration undelivered relevant documents.
Therefore, it also provides some information about the recall of the system implicitly.
2
This is not exactly the same, since in TREC the system is evaluated at the very end of
the filtering process.
Search WWH ::
Custom Search