Database Reference
In-Depth Information
optimized, but it also hides the complexity of Hadoop from analysts. Notice
that we referenced a number of Big Data declarative languages that are part
of the IBM platform? While this chapter focuses on text analytics, it's important
to realize that different Big Data projects require different kinds of optimiza-
tions. For example, text analytics is heavily dependent on CPU for processing.
At the same time, crunching through trillions of key value pairs in Hadoop
would tax a system's I/O capabilities (as you see in the Hadoop-respective
terasort and grep benchmarks). By providing highly optimized optimiza-
tion runtimes for specific tasks at hand, Big Data practitioners can focus on
analysis and discovery as opposed to performance tuning.
To the best of our knowledge, there're no other fully declarative text ana-
lytics languages available in the market today. You'll find high-level and
medium-level declarative languages, but they all make use of locked-up
“black-box” modules that can't be customized, restricting flexibility and
making it difficult to optimize for performance.
Being able to evolve text extractors is vitally important, because very few
things ever remain the same when it comes to analysis. We see this often with
social media analytics when popular slang terms or acronyms quickly become
“tired” (for example, a few years ago, many people would say “that's sick!”
if they liked something, but that's not used as often now—and we're happy
about that for obvious reasons).
AQL is designed to be easily modifiable, and when you do make changes,
your new code is optimized in tandem with existing code. In addition, AQL
is designed for reuse, enabling you to share analytics across organizations.
You can build discrete sets of extractors and use them as building blocks so
that you don't have to “start from scratch” all the time.
The Advanced Text Analytics Toolkit includes sets of built-in extractors for
elements that are commonly found in text collections. For example, Person
(names), PhoneNumber , Address , and URL are just some of the many
extractors that are shipped with BigInsights and Streams. In addition to a
generic set of extractors, there are also extensive collections of extractors
for social media text and for common types of log data. These built-in
extractors really flatten the time to effective text analytics development
curve. You can also build on these libraries with your own customizations,
Search WWH ::




Custom Search