Information Technology Reference
In-Depth Information
1
Overview
Anne Kao and Stephen R. Poteet
1.1 Introduction
Text mining is the discovery and extraction of interesting, non-trivial knowl-
edge from free or unstructured text. This encompasses everything from in-
formation retrieval (i.e., document or web site retrieval) to text classification
and clustering, to (somewhat more recently) entity, relation, and event extrac-
tion. Natural language processing (NLP), is the attempt to extract a fuller
meaning representation from free text. This can be put roughly as figuring
out who did what to whom, when, where, how and why. NLP typically makes
use of linguistic concepts such as part-of-speech (noun, verb, adjective, etc.)
and grammatical structure (either represented as phrases like noun phrase
or prepositional phrase, or dependency relations like subject-of or object-of).
It has to deal with anaphora (what previous noun does a pronoun or other
back-referring phrase correspond to) and ambiguities (both of words and of
grammatical structure, such as what is being modified by a given word or
prepositional phrase). To do this, it makes use of various knowledge repre-
sentations, such as a lexicon of words and their meanings and grammatical
properties and a set of grammar rules and often other resources such as an
ontology of entities and actions, or a thesaurus of synonyms or abbreviations.
This topic has several purposes. First, we want to explore the use of NLP
techniques in text mining, as well as some other technologies that are novel to
the field of text mining. Second, we wish to explore novel ways of integrating
various technologies, old or new, to solve a text mining problem. Next, we
would like to look at some new applications for text mining. Finally, we have
several chapters that provide various supporting techniques for either text
mining or NLP or both, or enhancements to existing techniques.
Search WWH ::




Custom Search