Text Mining - Data Mining for the Masses

Database Reference

In-Depth Information

1) Using your favorite search engine, locate a web site or discussion forum on the Internet

where people have posted complaints, criticisms or pleas for help regarding a company or

an industry (e.g. airlines, utility companies, insurance companies, etc.).

2) Copy and paste at least ten of these posts or comments into a text editor, saving each one

as its own text document with a unique name.

3) Open a new, blank process in RapidMiner, and using the Read Documents operator,

connect to each of your ten (or more) text documents containing the customer complaints

you found.

4) Process these documents in RapidMiner. Be sure you tokenize and use other handlers in

your sub-process as you deem appropriate/necessary. Experiment with grams and stems.

5) Use a k-Means cluster to group your documents into two, three or more clusters. Output

your word list as well.

6) Report the following:

a. Based on your word list, what seem to be the most common complaints or issues in

your documents? Why do you think that is? What evidence can you give to

support your claim?

b. Based on your word list, are there some terms or phrases that show up in all, or at

least most of your documents? Why do you think these are so common?

c. Based on your clusters, what groups did you get? What are the common themes in

each of your clusters? Is this surprising? Why or why not?

d. How might a customer service manager use your model to address the common

concerns or issues you found?

Challenge Step!

7) Using your knowledge from past chapters, removed the k-Means clustering operator, and

try to apply a different data mining methodology such as association rules or decision trees

to your text documents. Report your results.

Search WWH ::

Custom Search

Home