Database Reference
In-Depth Information
3) Unfortunately, it looks like at least one of our four documents ended up associated with
John Jay's paper (no. 5). This probably happened for two reasons: (1) We are using the k-
Means methodology and means in general tend to try to find a middle with equal parts on
both sides; and (2) Jay was writing on the same topic as were Hamilton and Madison. Thus,
there is going to be much similarity across the essays, so the means will more easily balance
even if Jay didn't contribute to paper 18. The topic alone will cause enough similarity that
paper 18 could be grouped with Jay, especially when the operator we've chosen is trying to
find equal balance. We can see how the four papers have been clustered by clicking on the
Folder View radio button and expanding both of the folder menu trees.
Figure 12-21. Examining the document clusters.
4) We can see that the first two papers and the last two papers were grouped together. This
can be a bit confusing because RapidMiner has renumbered the documents from 1 to 4, in
the order that we added them to our model. In the topic's example, we added them in
numerical order: 5, 14, 17, and then 18. So paper 5 corresponds to document 1, paper 14
corresponds to document 2, and so forth. If we can't remember the order in which we
added the papers to the model, we can click on the little white page icon to the left of the
document number to view the document's details:
Search WWH ::




Custom Search