Information Technology Reference
In-Depth Information
In this case, respondents' sensation patterns are represented by orders, and analysis
techniques for orders are required.
Orders are also useful when the absolute level of observations cannot be calibrated.
For example, when analyzing DNA microarray data, in order that the same fluoresce
level represents the same level of gene expression, experimental conditions must be
calibrated. However, DNA databases may consist of data sampled under various condi-
tions. Even in such cases, the higher level of fluoresce surely corresponds to the higher
level of gene expression. Therefore, by treating the values in the microarray data as or-
dinal values, non-calibrated data would be processed. Fujibuchi et al. adopted such use
of orders in searching a gene expression database for similar cell types [2].
And clustering is the task of partitioning a sample set into clusters having the prop-
erties of internal cohesion and external isolation [3]. This method is a basic tool for
exploratory data analysis. Clustering methods for orders are useful for revealing the
group structure of data represented by orders such as those described above.
To cluster a set of orders, classical clustering has been mainly used [4, chapter 2].
In these studies, clustering methods were applied to ordinal data of a social survey,
sensory test, etc. These data sets have been small in size; the number of objects to be
sorted and the length of orders are at most ten, and the number of orders to be clustered
are at most thousands. This is because an scoring method has been used to acquire
responses for a large-scale survey. Responses can easily be collected by requesting for
respondents to mark on rating scales that are printed on paper questionnaire forms.
On the other hand, using printed questionnaire forms is not appropriate for ranking
method, because respondents must rewrite entire response orders when they want to
correct them. Therefore, in a ranking method, respondents generally reply by sorting
real objects. For example, respondents are requested to sort glasses of wine according
to their preference. However, it would be costly to prepare so many glasses. Due to this
reason, ranking method has been used for a small-scale survey, even if its advantage
to an scoring method is known as described above. But now, adoption of computer
interface clear this obstacle in using a ranking method. Respondents can sort virtual
objects instead of real objects. Further, methods to implicitly collect preference orders
have proposed [5, 6]. These technical progress has made it easier to collect the large
number of ordinal data.
We can now collect a large-scale data that consist of orders. However, current tech-
niques for clustering orders are not fully scalable. For example, to cluster a set of orders,
dissimilarities are first calculated for all pairs of orders, and agglomerative hierarchical
clustering techniques are applied. This approach is computationally inefficient, because
computational cost of agglomerative hierarchical clustering is O ( N 2 log( N ))) under
non-Euclidean metric [7], where N is the number of orders to be clustered. To alleviate
this inefficiency in terms of N , we proposed a k -means-type algorithm k -o'means in
our previous work [8]. The computational complexity was reduced to O ( N ) in terms of
the number of orders. Though this method successfully extracted a grouping structure in
a set of orders, it was not efficient yet, if the number of possible objects to be sorted was
large. In this paper, to alleviate this inefficiency, we propose a new method, k -o'means-
EBC. Note that EBC means Expected Borda Count , which is a classic method to
find an order so as to be as concordant as possible with a given set of orders. And
 
Search WWH ::




Custom Search