Database Reference
In-Depth Information
9.8 Gaining Insights
So far this chapter has discussed several text analysis tasks including text collection,
text representation, TFIDF, topic models, and sentiment analysis. This section
shows how ACME uses these techniques to gain insights into customer opinions
about its products. To keep the example simple, this section only uses bPhone to
illustrate the steps.
Corresponding to the data collection phase, the Data Science team has used bPhone
as the keyword to collect more than 300 reviews from a popular technical review
website.
The 300 reviews are visualized as a word cloud after removing stop words. A word
cloud (or tag cloud ) is a visual representation of textual data. Tags are generally
single words, and the importance of each word is shown with font size or color.
Figure 9.9 shows the word cloud built from the 300 reviews. The reviews have
been previously case folded and tokenized into lowercased words, and stop words
have been removed from the text. A more frequently appearing word in Figure
9.9 is shown with a larger font size. The orientation of each word is only for the
aesthetical purpose. Most of the graph is taken up by the words phone and bphone ,
which occur frequently but are not very informative. Overall, the graph reveals little
information. The team needs to conduct further analyses on the data.
Figure 9.9 Word cloud on all 300 reviews on bPhone
Fortunately, the popular technical review website allows users to provide ratings on
a scale from one to five when they post reviews. The team can divide the reviews into
subgroups using those ratings.
To reveal more information, the team can remove words such as phone , bPhone ,
and ACME , which are not very useful for the study. Related research often refers to
these words as domain-specific stop words . Figure 9.10 shows the word cloud
Search WWH ::




Custom Search