Database Reference
In-Depth Information
Apply other data mining models to text mining results in order to predict or classify
based on textual analysis.
ORGANIZATIONAL UNDERSTANDING
Gillian would like to analyze paper 18's content in the context of the other papers with known
authors, to see if she can generate some evidence that the suspected collaboration between
Hamilton and Madison is in fact a likely scenario. She feels like text mining might be a good
method to analyze the text in a structured way, and has enlisted our help. Having studied all of the
Federalist Papers and other writings by the three statesmen who wrote them, Gillian feels
confident that paper 18 is a collaboration that John Jay did not contribute to—his vocabulary and
grammatical structure was quite different from those of Hamilton and Madison, even when all
three wrote on the same topic, as they had with the Federalist Papers. She would like to look for
word and phrase choice frequencies and present the outcome as part of her exhibit on the papers.
We will help Gillian by constructing a text mining model using the text from the Federalist Papers
and some standard text mining methodologies.
DATA UNDERSTANDING
Gillian's data set is simple: we will include the full text of Federalist Papers number 5 (Jay), 14
(Madison), 17 (Hamilton), and 18 (suspected collaboration between Madison and Hamilton). The
Federalist Papers are available through a number of sources: they have been re-published in book
form, they are available on a number of different web sites, and their text is archived in many
libraries throughout the world. For this chapter's exercise, the text of these four papers has been
added to the topic's companion web site. There are four files for you to download:
Chapter12_Federalist05_Jay.txt
Chapter12_Federalist14_Madison.txt
Chapter12_Federalist17_Hamilton.txt
Chapter12_Federalist18_Collaboration.txt.
Please download these now, but do not import them into a RapidMiner repository. The process of
handling textual data in RapidMiner is a bit different than what we have done in past chapters.
With these four papers' text available to us, we can move directly into the CRISP-DM phase of…
 
 
Search WWH ::




Custom Search