Database Reference
In-Depth Information
8
Differentiate Yourself with
Text Analytics
Although Big Data classifications often fall into the structured, semis-
tructured, and unstructured buckets, we want to put forward the notion
that all data has some kind of structure (taking a picture with your smart-
phone is likely to tag it with location awareness, a timestamp, metadata as
to its format and size, and so on. References to these varying degrees of
structure speak to the relative ease with which the data can be analyzed
and interpreted; the less structured it is, typically the more effort required
to extract insights. For example, a Facebook posting is structured data—it's
wrapped in the Java Script Object Notation (JSON) format. However, it's
the free-form text within the structure's notation that's the unstructured
part—and the hardest part of the data set to analyze. We've gotten very
good at analyzing information in our databases, but that data has been
cleansed and distilled into a highly structured form. Where businesses are
finding enormous challenges today is in analyzing data that's not so nicely for-
matted, such as emails, legal documents, social media messages, and log files.
As organizations increasingly rely on information that's locked in various
forms of textual data, it's critical that they're provided with a framework
that not only helps them make sense of what's in this text, but also helps
them do it in a cost-effective (think non highly specialized skill sets) and
relatively quick manner.
There are many problem domains characterized by unstructured and
semistructured data. One area in which we think that text analysis can be a
181
 
Search WWH ::




Custom Search