Database Reference
In-Depth Information
CHAPTER TWELVE:
TEXT MINING
CONTEXT AND PERSPECTIVE
Gillian is a historian and archivist at a national museum in the Unites States. She has recently
curated an exhibit on the Federalist Papers. The Federalist Papers are a series of dozens of essays
that were written and published in the late 1700's. The essays were published in two different
newspapers in the state of New York over the course of about one year, and they were released
anonymously under the author name 'Publius'. Their intent was to educate the American people
about the new nation's proposed constitution, and to advocate in favor of its ratification. No one
really knew at the time if 'Publius' was one individual or many, but several individuals familiar with
the authors and framers of the constitution had spotted some patterns in vocabulary and sentence
structure that seemed familiar to sections of the U. S. constitution. Years later, after Alexander
Hamilton died in the year 1804, some notes were discovered that revealed that he (Hamilton),
James Madison and John Jay had been the authors of the papers. The notes indicated specific
authors for some papers, but not for others. Specifically, John Jay was revealed to be the author
for papers 3, 4 and 5; Madison for paper 14; and Hamilton for paper 17. Paper 18 had no author
named, but there was evidence that Hamilton and Madison worked on that one together.
LEARNING OBJECTIVES
After completing the reading and exercises in this chapter, you should be able to:
Explain what text mining is, how it is used and the benefits of using it.
Recognize the various formats that text can be in, in order to perform text mining.
Connect to and import text as a data source for a text mining model.
Develop a text mining model in RapidMiner including common text-parsing operators
such as tokenization , stop word filtering, n-gram construction, stemming , etc.
189
 
 
 
Search WWH ::




Custom Search