Database Reference
In-Depth Information
continual development since 2004, and its engine has shipped with many
IBM products, including Lotus Notes, Cognos Consumer Insight, and more.
IBM InfoSphere BigInsights (BigInsights) and IBM InfoSphere Streams (Streams)
break new ground by including the SystemT technology in the Advanced Text
Analytics Toolkit (and associated accelerators), which opens up this once “black
box” technology for customization and more general purpose use than when it
was delivered as function within a product. This toolkit includes a declarative
language—Annotated Query Language (AQL)—with an associated cost-
based optimizer, an IDE to write rules, a text analytics processing engine
(ready for MapReduce and streaming data settings), and a number of built-in
text extractors that include hundreds of rules pre-developed through IBM's
customer engagements across a myriad of industries. The Advanced Text
Analytics Toolkit also contains multilingual support, including support for
double-byte character languages (through Unicode). By providing an opti-
mizer, an AQL assistance framework, and debugging tools, you can see how
the Advanced Text Analytics Toolkit is poised to democratize the ability to
perform analysis on unstructured data in the same way that SQL has for
database queries.
What's special about the Advanced Text Analytics Toolkit is its approach
to text extraction: to ensure high accuracy (precision) and full coverage
(recall), the solution builds many specific rules. This concept is built into
AQL and its run-time engine, which form the heart of the Advanced Text
Analytics Toolkit. AQL enables you to aggregate these many rules to repre-
sent an individual extractor. For example, an extractor for telephone num-
bers can contain literally hundreds of rules to match the many ways that
people around the world express this concept. In addition, AQL is a fully
declarative language, which means that all these overlapping rules get
distilled and optimized into a highly efficient access path (similar to an SQL
compiler for relational databases, where IBM researchers first developed this
declarative concept), while the complexity of the underlying system is
abstracted from the end user. Quite simply, when you write extraction logic
in AQL, you tell the IBM Big Data platform what to extract, and the platform
figures out how to extract it. This is an important differentiator of the IBM Big
Data platform. The use of declarative languages (for example, AQL, the
Streams Processing Language, and the machine learning statistical language)
not only has significant performance benefits when this analytics code gets
Search WWH ::




Custom Search