Database Reference
In-Depth Information
6) We can see by looking at the first several attributes that for document ID 1, the file is
Chapter12_Federalist05_Jay.txt. Thus if we can't remember that we added paper 5 first,
resulting in RapidMiner labeling it document 1, we can check it in the document details.
This little trick works when you have used the Read Document operator, as the document
being read becomes the value for the metadata_file attribute, however when using some
other operators, such as the Create Document operator, it doesn't work, as you will see
momentarily. Since we added our papers in numerical order in this chapter's example, we
do not necessarily need to view and sort the details for each of the documents, but you
may if you wish. Knowing that documents 1 and 2 are Jay (no. 5) and Madison (no. 14),
and documents 3 and 4 are Hamilton (no. 17) and suspected collaboration (no. 18), we can
be encouraged by what we see in this model. It appears that Hamilton does have something
to do with Federalist Paper 18, but we don't know about Madison yet because Madison
was grouped with Jay, probably as a result of the previously discussed mean balancing that
k-means clustering is prone to do.
7) Perhaps we can address this by better training our model to recognize Jay's writing. Using
your favorite search engine, search the Internet for the text of Federalist Paper No. 3.
Gillian knows that this paper's authorship has been connected to John Jay. We will use the
text to train our model to better recognize Jay's writing. If paper 18 was written by, or
even contributed to by Jay, perhaps we will find that it gets clustered with Jay's papers 3
and 5 when we add paper 3 to the model. In this case, Hamilton and Madison should get
clustered together. If on the other hand paper 18 was not written or contributed to by Jay,
paper 18 should gravitate toward Hamilton (no. 17) and/or Madison (no. 14), so long as
Jay was consistent in his writing between papers 3 and 5. Copy the text of paper 3 by
highlighting it in whichever web site you found (it is available on a number of sites). Then
in design perspective in RapidMiner, locate the Create Document operator and drag it into
your process (Figure 12-23).
Search WWH ::

Custom Search