Databases Reference
In-Depth Information
The clusters are neither
exclusive
(e.g., where one gene can participate in multiple
clusters) nor
exhaustive
(e.g., where a gene may not participate in any cluster).
Biclustering is useful not only in bioinformatics, but also in other applications as well.
Consider recommender systems as an example.
Example11.13
Using biclustering for a recommender system.
AllElectronics
collects data from cus-
tomers' evaluations of products and uses the data to recommend products to customers.
The data can be modeled as a customer-product matrix, where each row represents a
customer, and each column represents a product. Each element in the matrix represents
a customer's evaluation of a product, which may be a score (e.g., like, like somewhat,
not like) or purchase behavior (e.g., buy or not). Figure 11.4 illustrates the structure.
The customer-product matrix can be analyzed in two dimensions: the
customer
dimension and the
product
dimension. Treating each customer as an object and products
as attributes,
AllElectronics
can find customer groups that have similar preferences or
purchase patterns. Using products as objects and customers as attributes,
AllElectronics
can mine product groups that are similar in customer interest.
Moreover,
AllElectronics
can mine clusters in both customers and products simulta-
neously. Such a cluster contains a subset of customers and involves a subset of products.
For example,
AllElectronics
is highly interested in finding a group of customers who all
like the same group of products. Such a cluster is a submatrix in the customer-product
matrix, where all elements have a high value. Using such a cluster,
AllElectronics
can
make recommendations in two directions. First, the company can recommend products
to new customers who are similar to the customers in the cluster. Second, the company
can recommend to customers new products that are similar to those involved in the
cluster.
As with biclusters in a gene expression data matrix, the biclusters in a customer-
product matrix usually have the following characteristics:
Only a small set of customers participate in a cluster.
A cluster involves only a small subset of products.
A customer can participate in multiple clusters, or may not participate in any
cluster.
Products
w
11
w
12
w
1
m
Customers
w
21
w
22
w
2
m
w
n
1
w
n
2
w
nm
Figure11.4
Customer-product matrix.