Differentiate Yourself with Text Analytics - Harness the Power of Big Data

Database Reference

In-Depth Information

continual development since 2004, and its engine has shipped with many

IBM products, including Lotus Notes, Cognos Consumer Insight, and more.

IBM InfoSphere BigInsights (BigInsights) and IBM InfoSphere Streams (Streams)

break new ground by including the SystemT technology in the Advanced Text

Analytics Toolkit (and associated accelerators), which opens up this once “black

box” technology for customization and more general purpose use than when it

was delivered as function within a product. This toolkit includes a declarative

language—Annotated Query Language (AQL)—with an associated cost-

based optimizer, an IDE to write rules, a text analytics processing engine

(ready for MapReduce and streaming data settings), and a number of built-in

text extractors that include hundreds of rules pre-developed through IBM's

customer engagements across a myriad of industries. The Advanced Text

Analytics Toolkit also contains multilingual support, including support for

double-byte character languages (through Unicode). By providing an opti-

mizer, an AQL assistance framework, and debugging tools, you can see how

the Advanced Text Analytics Toolkit is poised to democratize the ability to

perform analysis on unstructured data in the same way that SQL has for

database queries.

What's special about the Advanced Text Analytics Toolkit is its approach

to text extraction: to ensure high accuracy (precision) and full coverage

(recall), the solution builds many specific rules. This concept is built into

AQL and its run-time engine, which form the heart of the Advanced Text

Analytics Toolkit. AQL enables you to aggregate these many rules to repre-

sent an individual extractor. For example, an extractor for telephone num-

bers can contain literally hundreds of rules to match the many ways that

people around the world express this concept. In addition, AQL is a fully

declarative language, which means that all these overlapping rules get

distilled and optimized into a highly efficient access path (similar to an SQL

compiler for relational databases, where IBM researchers first developed this

declarative concept), while the complexity of the underlying system is

abstracted from the end user. Quite simply, when you write extraction logic

in AQL, you tell the IBM Big Data platform what to extract, and the platform

figures out how to extract it. This is an important differentiator of the IBM Big

Data platform. The use of declarative languages (for example, AQL, the

Streams Processing Language, and the machine learning statistical language)

not only has significant performance benefits when this analytics code gets

Search WWH ::

Custom Search

Home