Overview - Natural Language Processing and Text Mining

Information Technology Reference

In-Depth Information

1

Overview

Anne Kao and Stephen R. Poteet

1.1 Introduction

Text mining is the discovery and extraction of interesting, non-trivial knowl-

edge from free or unstructured text. This encompasses everything from in-

formation retrieval (i.e., document or web site retrieval) to text classification

and clustering, to (somewhat more recently) entity, relation, and event extrac-

tion. Natural language processing (NLP), is the attempt to extract a fuller

meaning representation from free text. This can be put roughly as figuring

out who did what to whom, when, where, how and why. NLP typically makes

use of linguistic concepts such as part-of-speech (noun, verb, adjective, etc.)

and grammatical structure (either represented as phrases like noun phrase

or prepositional phrase, or dependency relations like subject-of or object-of).

It has to deal with anaphora (what previous noun does a pronoun or other

back-referring phrase correspond to) and ambiguities (both of words and of

grammatical structure, such as what is being modified by a given word or

prepositional phrase). To do this, it makes use of various knowledge repre-

sentations, such as a lexicon of words and their meanings and grammatical

properties and a set of grammar rules and often other resources such as an

ontology of entities and actions, or a thesaurus of synonyms or abbreviations.

This topic has several purposes. First, we want to explore the use of NLP

techniques in text mining, as well as some other technologies that are novel to

the field of text mining. Second, we wish to explore novel ways of integrating

various technologies, old or new, to solve a text mining problem. Next, we

would like to look at some new applications for text mining. Finally, we have

several chapters that provide various supporting techniques for either text

mining or NLP or both, or enhancements to existing techniques.

Search WWH ::

Custom Search

Home