Data Mining Trends and Research Frontiers - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

providers can be used to help consolidate the remaining, unreliable portions of the data.

This reduces the costly efforts of labeling the data by hand and of training on massive,

dynamic, real-world data sets.

Clustering and Classification of Graphs

and Homogeneous Networks

Large graphs and networks have cohesive structures, which are often hidden among

their massive, interconnected nodes and links. Cluster analysis methods have been devel-

oped on large networks to uncover network structures, discover hidden communities,

hubs, and outliers based on network topological structures and their associated prop-

erties. Various kinds of network clustering methods have been developed and can be

categorized as either partitioning, hierarchical, or density-based algorithms. Moreover,

given human-labeled training data, the discovery of network structures can be guided

by human-specified heuristic constraints. Supervised classification and semi-supervised

classification of networks are recent hot topics in the data mining research community.

Clustering, Ranking, and Classification

of Heterogeneous Networks

A heterogeneous network contains interconnected nodes and links of different types.

Such interconnected structures contain rich information, which can be used to mutu-

ally enhance nodes and links, and propagate knowledge from one type to another.

Clustering and ranking of such heterogeneous networks can be performed hand-in-

hand in the context that highly ranked nodes/links in a cluster may contribute more

than their lower-ranked counterparts in the evaluation of the cohesiveness of a cluster.

Clustering may help consolidate the high ranking of objects/links dedicated to the clus-

ter. Such mutual enhancement of ranking and clustering prompted the development

of an algorithm called RankClus. Moreover, users may specify different ranking rules

or present labeled nodes/links for certain data types. Knowledge of one type can be

propagated to other types. Such propagation reaches the nodes/links of the same type

via heterogeneous-type connections. Algorithms have been developed for supervised

learning and semi-supervised learning in heterogeneous networks.

Role Discovery and Link Prediction

in Information Networks

There exist many hidden roles or relationships among different nodes/links in a hetero-

geneous network. Examples include advisor-advisee and leader-follower relationships

in a research publication network. To discover such hidden roles or relationships, experts

can specify constraints based on their background knowledge. Enforcing such con-

straints may help cross-checking and validation in large interconnected networks.

Information redundancy in a network can often be used to help weed out objects/links

that do not follow such constraints.

Search WWH ::

Custom Search

Home