Database Reference
In-Depth Information
goals scored from the UEFA Championships transcripts, and 30 of the 76
passages identified as goal commentaries weren't for goals at all, your
precision would be just under 60 percent. In summary, precision describes
how many passages identified are correctly identified.
Recall A measure of completeness, the percentage of relevant results
that are retrieved from the text; in other words, are all the valid strings
from the original text showing up? For example, if you wanted to extract
all of the goals scored in the UEFA Championships from video, and got
60 out of 76 that would be found by a human expert, your recall would
be about 79 percent, because your application missed 21 percent of goals
scored. In summary, recall is how many matching passages are found
out of the total number of matching passages.
As analysts develop their extractors and applications, they iteratively
make refinements to tune their precision and recall rates. A great analogy is
an avalanche. If the avalanche didn't pick up speed and more snow as it tum-
bles down a mountain slope, it wouldn't have much impact. The develop-
ment of extractors is really about adding more rules and knowledge to the
extractor itself; in short, it's about getting more powerful with each iteration.
We've found that most marketplace approaches to text analytics present
challenges for analysts, because they tend to perform poorly (in terms of both
accuracy and speed) and they are difficult to build or modify. These approaches
flow the text forward through a system of extractors and filters, with no
optimization. This technique is inflexible and inefficient, often resulting in
redundant processing, because extractors applied later in the workflow
might do work that had already been completed earlier. From what we can
tell, today's text toolkits are not only inflexible and inefficient, they're also
limited in their expressiveness (specifically, the degree of granularity that's
possible with their queries), which results in analysts having to develop cus-
tom code. This, in turn, leads to more delays, complexity, and difficulty in
refining the accuracy of your result set (precision and recall).
The Annotated Query Language
to the Rescue!
To meet these challenges, the IBM Big Data platform provides the Advanced
Text Analytics Toolkit, especially designed to deal with the challenges inher-
ent in Big Data. This toolkit (originally code-named SystemT) has been under
 
Search WWH ::




Custom Search