Database Reference
In-Depth Information
1) Using your favorite search engine, locate a web site or discussion forum on the Internet
where people have posted complaints, criticisms or pleas for help regarding a company or
an industry (e.g. airlines, utility companies, insurance companies, etc.).
2) Copy and paste at least ten of these posts or comments into a text editor, saving each one
as its own text document with a unique name.
3) Open a new, blank process in RapidMiner, and using the Read Documents operator,
connect to each of your ten (or more) text documents containing the customer complaints
you found.
4) Process these documents in RapidMiner. Be sure you tokenize and use other handlers in
your sub-process as you deem appropriate/necessary. Experiment with grams and stems.
5) Use a k-Means cluster to group your documents into two, three or more clusters. Output
your word list as well.
6) Report the following:
a. Based on your word list, what seem to be the most common complaints or issues in
your documents? Why do you think that is? What evidence can you give to
support your claim?
b. Based on your word list, are there some terms or phrases that show up in all, or at
least most of your documents? Why do you think these are so common?
c. Based on your clusters, what groups did you get? What are the common themes in
each of your clusters? Is this surprising? Why or why not?
d. How might a customer service manager use your model to address the common
concerns or issues you found?
Challenge Step!
7) Using your knowledge from past chapters, removed the k-Means clustering operator, and
try to apply a different data mining methodology such as association rules or decision trees
to your text documents. Report your results.
Search WWH ::




Custom Search