Differentiate Yourself with Text Analytics - Harness the Power of Big Data

Database Reference

In-Depth Information

What Is Text Analysis?

For each of the example scenarios we just mentioned, the challenge is to parse

the text, find the elements that are being searched for, understand their meaning,

and extract them in a structured form for use in other applications. IBM has a lot

of experience in this area, and we've personally seen a lot of organizations try

to get started on their own. Because of this, we can tell you that it's no easy

task—you need a toolkit with accelerators, an integrated development environ-

ment (IDE), and preferably a declarative language to make this consumable

and reachable for most organizations. After all, you can't democratize Big Data

across your organization if the analysis solely depends on near impossible to

find or learn skill sets. Beyond the fact that text data is unstructured, languages

are complex—even when you don't factor in spelling mistakes, abbreviations, or

advanced usage, such as sarcasm. As such, you need a system that is deep and

flexible enough to handle complexity.

What follows is an example of this process, in which a text analysis ap-

plication reads a paragraph of text and derives structured information based

on various rules. These rules are defined in extractors , which can, for instance,

identify an entity's name within a text field. Consider the following text:

In the 2012 UEFA European Football Championship, Spain

continued their international success, beating Italy 4-0

in the Final. Spanish w inger David Silva opened the scor-

ing early in the game, beating Italian g oalie Gianluigi

Buffon . After a full 90 minutes of dominance, g oalkeeper

Iker Casillas accepted the championship trophy for Spain.

The product of these extractors is a set of annotated text, as shown in the

underlined text in this passage. The following structured data is derived

from this example text:

Name Position Country

David Silva Winger Spain

Gianluigi Buffon Goalkeeper Italy

Iker Casillas Goalkeeper Spain

The challenge is to ensure the accuracy of results . Accuracy has two compo-

nents, precision and recall:

• Precision A measure of exactness, the percentage of items in the result

set that are relevant: “Are the results you're getting valid?” For example,

if you wanted to extract all the play-by-play descriptions associated with

Search WWH ::

Custom Search

Home