Database Reference
In-Depth Information
10.1 Analytics for Unstructured Data
Prior to conducting data analysis, the required data must be collected and processed
to extract the useful information. The degree of initial processing and data
preparation depends on the volume of data, as well as how straightforward it is to
understand the structure of the data.
Recall the four types of data structures discussed in Chapter 1, “Introduction to Big
Data Analytics”:
Structured: A specific and consistent format (for example, a data table)
Semi-structured: A self-describing format (for example, an XML file)
Quasi-structured: A somewhat inconsistent format (for example, a
hyperlink)
Unstructured: An inconsistent format (for example, text or video)
Structured data, such as relational database management system (RDBMS) tables,
is typically the easiest data format to interpret. However, in practice it is still
necessary to understand the various values that may appear in a certain column
and what these values represent in different situations (based, for example, on
the contents of the other columns for the same record). Also, some columns may
contain unstructured text or stored objects, such as pictures or videos. Although the
tools presented in this chapter focus on unstructured data, these tools can also be
utilized for more structured datasets.
10.1.1 Use Cases
The following material provides several use cases for MapReduce. The MapReduce
paradigm offers the means to break a large task into smaller tasks, run tasks in
parallel, and consolidate the outputs of the individual tasks into the final output.
Apache Hadoop includes a software implementation of MapReduce. More details on
MapReduce and Hadoop are provided later in this chapter.
IBM Watson
In 2011, IBM's computer system Watson participated in the U.S. television game
show Jeopardy against two of the best Jeopardy champions in the show's history.
In the game, the contestants are provided a clue such as “He likes his martinis
shaken, not stirred” and the correct response, phrased in the form of a question,
Search WWH ::




Custom Search