Advanced Analytical Theory and Methods: Text Analysis - Data Science and Big Data Analytics

Database Reference

In-Depth Information

9.2 A Text Analysis Example

To further describe the three text analysis steps, consider the fictitious company

ACME, maker of two products: bPhone and bEbook . ACME is in strong

competition with other companies that manufacture and sell similar products. To

succeed, ACME needs to produce excellent phones and eBook readers and increase

sales.

One of the ways the company does this is to monitor what is being said about

ACME products in social media. In other words, what is the buzz on its products?

ACME wants to search all that is said about ACME products in social media sites,

such as Twitter and Facebook, and popular review sites, such as Amazon and

ConsumerReports. It wants to answer questions such as these.

• Are people mentioning its products?

• What is being said? Are the products seen as good or bad? If people think

an ACME product is bad, why? For example, are they complaining about

the battery life of the bPhone , or the response time in their bEbook ?

ACME can monitor the social media buzz using a simple process based on the three

steps outlined in Section 9.1. This process is illustrated in Figure 9.1 , and it includes

the modules in the next list.

1. Collect raw text (Section 9.3). This corresponds to Phase 1 and Phase 2 of

the Data Analytic Lifecycle. In this step, the Data Science team at ACME

monitors websites for references to specific products. The websites may

include social media and review sites. The team could interact with social

network application programming interfaces (APIs) process data feeds, or

scrape pages and use product names as keywords to get the raw data.

Regular expressions are commonly used in this case to identify text that

matches certain patterns. Additional filters can be applied to the raw data

for a more focused study. For example, only retrieving the reviews

originating in New York instead of the entire United States would allow

ACME to conduct regional studies on its products. Generally, it is a good

practice to apply filters during the data collection phase. They can reduce

I/O workloads and minimize the storage requirements.

2. Represent text (Section 9.4). Convert each review into a suitable document

representation with proper indices, and build a corpus based on these

Search WWH ::

Custom Search

Home