Data Mining Trends and Research Frontiers - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

typically done through the discovery of patterns and trends by means such as statistical

pattern learning, topic modeling, and statistical language modeling. Text mining usu-

ally requires structuring the input text (e.g., parsing, along with the addition of some

derived linguistic features and the removal of others, and subsequent insertion into a

database). This is followed by deriving patterns within the structured data, and evalua-

tion and interpretation of the output. “High quality” in text mining usually refers to a

combination of relevance, novelty, and interestingness.

Typical text mining tasks include text categorization, text clustering, concept/entity

extraction, production of granular taxonomies, sentiment analysis, document summa-

rization, and entity-relation modeling (i.e., learning relations between named entities).

Other examples include multilingual data mining, multidimensional text analysis, con-

textual text mining, and trust and evolution analysis in text data, as well as text mining

applications in security, biomedical literature analysis, online media analysis, and ana-

lytical customer relationship management. Various kinds of text mining and analysis

software and tools are available in academic institutions, open-source forums, and

industry. Text mining often also uses WordNet, Sematic Web, Wikipedia, and other

information sources to enhance the understanding and mining of text data.

Mining Web Data

The World Wide Web serves as a huge, widely distributed, global information center for

news, advertisements, consumer information, financial management, education, gov-

ernment, and e-commerce. It contains a rich and dynamic collection of information

about web page contents with hypertext structures and multimedia, hyperlink informa-

tion, and access and usage information, providing fertile sources for data mining. Web

mining is the application of data mining techniques to discover patterns, structures, and

knowledge from the Web. According to analysis targets, web mining can be organized

into three main areas: web content mining , web structure mining , and web usage mining .

Web content mining analyzes web content such as text, multimedia data, and struc-

tured data (within web pages or linked across web pages). This is done to understand the

content of web pages, provide scalable and informative keyword-based page indexing,

entity/concept resolution, web page relevance and ranking, web page content sum-

maries, and other valuable information related to web search and analysis. Web pages

can reside either on the surface web or on the deep Web . The surface web is that por-

tion of the Web that is indexed by typical search engines. The deep Web (or hidden Web )

refers to web content that is not part of the surface web. Its contents are provided by

underlying database engines.

Web content mining has been studied extensively by researchers, search engines, and

other web service companies. Web content mining can build links across multiple web

pages for individuals; therefore, it has the potential to inappropriately disclose personal

information. Studies on privacy-preserving data mining address this concern through

the development of techniques to protect personal privacy on the Web.

Web structure mining is the process of using graph and network mining theory

and methods to analyze the nodes and connection structures on the Web. It extracts

patterns from hyperlinks, where a hyperlink is a structural component that connects a

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home