Information Technology Reference
In-Depth Information
Table 3. Statistics of quality judgments
Number of pages judged
143
Number of pages judged to be good quality pages by both 2 judges
53
Number of pages judged to be good quality pages by one judge but not the other
11
Number of pages judged to be good quality after instructor examination to resolve the difference between two judges
5
Final number of good quality pages in the wiki
58
Final number of ordinary pages (i.e., wiki pages that did not receive “good quality” status.)
85
ship existed between:
into two groups: high quality pages and ordinary
pages. From the many techniques we used, we
found that linear discriminant analysis (Klecka
1980) gave us a clear picture of the effectiveness
of using the predictive variables to estimate the
quality of the pages.
1.
The quality of the wiki pages, and
2.
(A) The http access data recorded in the http
log file accessing the wiki pages; (B) the
collaborative statistics recorded in the wiki
log file, and (C) the textual features of the
wiki pages.
discriminant Analysis and
stepwise variables selection
The objective was to identify variables that
were highly correlated with the quality of contents
created by open collaborative knowledge build-
ing, and then use the variables to help to create
automatic mechanism that can locate contents that
might have low quality. Table 3 listed the categories
of variables used in the correlation.
We applied various advanced statistical meth-
ods to analyze the data and identify factors that
would affect the quality of a wiki page. Based on
the analysis, machine learning method was applied
to construct an automatic quality predictor. The
following sections report the findings.
In discriminant analysis, we sought a linear com-
bination of the frequencies of all the variables
summarized in Table 3 (denoted by v i in the
following equation) as the basis for assigning
pages into the two groups (high quality pages vs
ordinary pages):
S = b 0 + Sb i v i
where S is the discriminant score. Beta was chosen
in such a way that the ratio of between-group sum
of squares to the within-group sum of squares
would be maximum, i.e.,:
dAtA AnAlysis And results
N
N
all
all
å å
å å
S
S
S
S
At the exploratory stage of data analysis, we
started with parametric statistical methods to
understand the relationship of the possible predic-
tive features (as summarized in Table 3) and the
quality of a wiki page (as summarized in Table 4)
based on a normal distribution assumption. The
goal was to determine which technique would be
most effective in classifying the pages correctly
i
j
k
k
ihigh
Î
2
jLow
Î
2
(
-
k
=
1
)
+
(
-
k
=
1
)
N
N
N
N
High
all
Low
all
å
å
S
S
i
j
å
å
iHigh
Î
2
jLow
Î
2
(
S
-
)
+
(
S
-
)
i
j
N
N
iHigh
Î
jLow
Î
High
Low
 
Search WWH ::




Custom Search