Information Technology Reference
In-Depth Information
Metadata in and of themselves are of limited value if they are not universally
adopted. Indeed, the challenge of big data is largely defi ned by the enormity of data and
the general inability to decipher not only what the data might mean, but also what the
data actually are. This is the essential role that metadata plays. The process of applying
metadata to individual datum or sets of data is referred to as “indexing.” Canonically,
the process of indexing involved the use of tags that could be used for quick reference
about the contents of a particular data object. Indexing can be divided into two major
tasks: (1) organization of key metadata that are supplied by the data object creator; and
(2) application of additional metadata that are specifi c to the purpose of how the data
objects might be organized within an information retrieval system.
The process of indexing thus provides structured information that would other-
wise be considered unstructured. In the case of biomedical literature, the largest
benefi t of indexing is that it provides a necessary functionality to retrieve appropri-
ate information at the point and time of need. For example, if one wanted to identify
all literature published by a given author, a well-indexed system would allow for
query by the author's name that is identifi ed for each contained data object. Of
course, there are inherent challenges with this particular type of query, since one's
name may not necessarily be unique.
Metadata and its associated indexing process offer something that natural
language does not: quick thumbnail descriptions of collections of data objects. On
the other hand, metadata does not necessarily refl ect the full view of what might be
contained within a given data object. Thus, one might consider metadata to allow
for rapid, highly reliable retrieval of related data objects but more involved meth-
odologies are required to shed light on the specifi cs of what are contained in the
data objects themselves. The large volumes of data that are generated does suggest
that there may be some merit to leveraging computational techniques such as NLU
to facilitate the indexing process. Indeed, the National Library of Medicine has
been researching this very challenge through its Medical Text Indexer (MTI) initia-
tive [ 32 ]. The aforementioned MetaMap system is in fact a major artifact of this
initiative and has been shown to perform at similar levels as human indexers. Such
initiatives refl ect a major paradigm shift that forms the basis for the key challenge
in the era of big data: one that is concerned less with the generation of new data,
but instead one that is focused on how to identify relevant data to meet a set of
needs.
5.2.4
Modeling Techniques
Through the development and use of techniques such as NLU and metadata index-
ing, it is possible to explore large volumes of text using systematic techniques.
Beyond the day-to-day utility of indexed information that can be readily accessed at
the time of need (e.g . , to identify the most relevant literature by a physician in need
of the latest literature about the effi cacy of a particular treatment regimen), there is
a signifi cant potential benefi t to computable data that can be used to infer new
Search WWH ::




Custom Search