Database Reference
In-Depth Information
What Is Text Analysis?
For each of the example scenarios we just mentioned, the challenge is to parse
the text, find the elements that are being searched for, understand their meaning,
and extract them in a structured form for use in other applications. IBM has a lot
of experience in this area, and we've personally seen a lot of organizations try
to get started on their own. Because of this, we can tell you that it's no easy
task—you need a toolkit with accelerators, an integrated development environ-
ment (IDE), and preferably a declarative language to make this consumable
and reachable for most organizations. After all, you can't democratize Big Data
across your organization if the analysis solely depends on near impossible to
find or learn skill sets. Beyond the fact that text data is unstructured, languages
are complex—even when you don't factor in spelling mistakes, abbreviations, or
advanced usage, such as sarcasm. As such, you need a system that is deep and
flexible enough to handle complexity.
What follows is an example of this process, in which a text analysis ap-
plication reads a paragraph of text and derives structured information based
on various rules. These rules are defined in extractors , which can, for instance,
identify an entity's name within a text field. Consider the following text:
In the 2012 UEFA European Football Championship, Spain
continued their international success, beating Italy 4-0
in the Final. Spanish w inger David Silva opened the scor-
ing early in the game, beating Italian g oalie Gianluigi
Buffon . After a full 90 minutes of dominance, g oalkeeper
Iker Casillas accepted the championship trophy for Spain.
The product of these extractors is a set of annotated text, as shown in the
underlined text in this passage. The following structured data is derived
from this example text:
Name Position Country
David Silva Winger Spain
Gianluigi Buffon Goalkeeper Italy
Iker Casillas Goalkeeper Spain
The challenge is to ensure the accuracy of results . Accuracy has two compo-
nents, precision and recall:
Precision A measure of exactness, the percentage of items in the result
set that are relevant: “Are the results you're getting valid?” For example,
if you wanted to extract all the play-by-play descriptions associated with
 
Search WWH ::




Custom Search