Database Reference
In-Depth Information
3 is defined as a quite rich sequence of elements
and attributes among which we are interested
with: ComparisonMeasure for the distance or
similarity measure used for clustering, Cluster-
ingField for attributes on which the clustering
is based, Cluster} for describing each resulting
cluster, modelName , modelClass , algorithmName
and numberOfClusters . In the example shown in
Figure 3, we only keep these ones.
A cluster model basically consists of a set of
clusters. The standard PMML representation of
clusters is rather simple but extensions may be
defined thanks to the element Extension .
We have extended this PMML representation
for hierarchical clustering. One required evalua-
tion of clustering models on gene expression is to
check the semantic meaning of clusters. Indeed,
when clusters of genes are elicited, one main issue
is to check their soundness and understand the
common characteristics of their items (genes).
One obvious way is to search for genes that were
annotated by domain ontologies like GO.
Let us consider as examples, the two clus-
ters Cluster_GSE6281_hclust_1 and Clus-
ter_GSE6281_hclust_2 that were elicited from
GSE6281 data when searching for co-expressed
genes. We applied a hierarchical clustering model
which clusters gather genes that have similar ex-
pression profiles over the 4 time points: 0h, 7h,
48h and 96h. Figures 4 and 5 provide a descrip-
tion of each cluster with: in the column Probe the
probeset name, in the column Symbol the gene
name and in the third column the gene descrip-
tion for each cluster. We observe for instance that
cluster Cluster_GSE6281_hclust_1 includes genes
CCL5, CCL19, CCL22 that are chemokines.
GO annotations that annotate the most part of
genes in each cluster are shown partly in Figures 6
and 7. Each cluster is annotated by a list of its most
descriptive GO concepts. For each GO concept,
the number of genes annotated and the annotation
frequency in the cluster are given. These figures
illustrate cluster annotations from the Molecular
function GO ontology only.
The PMML extension we defined provides:
the list of gene names in the cluster,
and the list of
GO concepts that annotate
its genes ; annotations are split according
to the three ontologies in GO (''Molecular
Figure 3. Cluster representation in PMML
Search WWH ::




Custom Search