Databases Reference
In-Depth Information
Figure 4.1: Wordle™ Word Cloud
Search and Count
Google and Yahoo rapidly became household terms because of their ability to
search the web for speciic topics. A typical search engine offers the ability to
search documents using a set of search terms and may ind a large number of
candidate documents. It prioritizes the results based on preset criteria that can
be inluenced by how we choose the documents.
If I have a lot of unstructured data, I can count words to ind the most
commonly used words. Wordle™ ( www.wordle.net ) provides word clouds for
the unstructured data provided to it. For example, Figure 4.1 shows a word cloud
for the text used in this topic. The font size represents the number of times a word
was used in the text.
This data can be laid out against other known dimensions. For example, this
summer we were working on unstructured data analytics for a CSP in India. We
received a large quantity of unstructured text. Our irst exercise was to use the
Text Analytics capabilities in Cognos ® Consumer Insight (CCI) to study key
words being used as plotted against time. Figure 4.2 shows the results of this
word count plotted against time.
Context-Sensitive and Domain-Speciic Searches
Anyone with telecommunications knowledge can easily understand what “3g”
and “4g” in Figure 4.2 refer to. Context-sensitive search engines can differenti-
ate between “gold medal” (Olympics) and “gold bullion” (commodity trading).
Also, some of the search engines are ine tuned for industry or corporate terms.
 
Search WWH ::




Custom Search