Differentiate Yourself with Text Analytics - Harness the Power of Big Data

Database Reference

In-Depth Information

optimized, but it also hides the complexity of Hadoop from analysts. Notice

that we referenced a number of Big Data declarative languages that are part

of the IBM platform? While this chapter focuses on text analytics, it's important

to realize that different Big Data projects require different kinds of optimiza-

tions. For example, text analytics is heavily dependent on CPU for processing.

At the same time, crunching through trillions of key value pairs in Hadoop

would tax a system's I/O capabilities (as you see in the Hadoop-respective

terasort and grep benchmarks). By providing highly optimized optimiza-

tion runtimes for specific tasks at hand, Big Data practitioners can focus on

analysis and discovery as opposed to performance tuning.

To the best of our knowledge, there're no other fully declarative text ana-

lytics languages available in the market today. You'll find high-level and

medium-level declarative languages, but they all make use of locked-up

“black-box” modules that can't be customized, restricting flexibility and

making it difficult to optimize for performance.

Being able to evolve text extractors is vitally important, because very few

things ever remain the same when it comes to analysis. We see this often with

social media analytics when popular slang terms or acronyms quickly become

“tired” (for example, a few years ago, many people would say “that's sick!”

if they liked something, but that's not used as often now—and we're happy

about that for obvious reasons).

AQL is designed to be easily modifiable, and when you do make changes,

your new code is optimized in tandem with existing code. In addition, AQL

is designed for reuse, enabling you to share analytics across organizations.

You can build discrete sets of extractors and use them as building blocks so

that you don't have to “start from scratch” all the time.

The Advanced Text Analytics Toolkit includes sets of built-in extractors for

elements that are commonly found in text collections. For example, Person

(names), PhoneNumber , Address , and URL are just some of the many

extractors that are shipped with BigInsights and Streams. In addition to a

generic set of extractors, there are also extensive collections of extractors

for social media text and for common types of log data. These built-in

extractors really flatten the time to effective text analytics development

curve. You can also build on these libraries with your own customizations,

Search WWH ::

Custom Search

Home