Information Technology Reference
In-Depth Information
technologies indicates the beginning of a new form of continuous technology
advancement that is characterized by overlapping technology waves related
to different aspects of human activity from production and consumption to
collaboration and general social activity. In this context, data-intensive science
plays a key role.
Big data are becoming related to almost all aspects of human activity, from
just recording events to research, design, production, and digital services or
products delivery, to the final consumer. Current technologies, such as cloud
computing and ubiquitous network connectivity, provide a platform for auto-
mation of all processes in data collection, storing, processing, and visualization.
Modern e-science infrastructures allow targeting new large-scale problems
whose solution was not possible previously (e.g., genome, climate, global
warming). e-Science typically produces a huge amount of data that need to be
supported by a new type of e-infrastructure capable of storing, distributing,
processing, preserving, and curating these data [1, 2]: We refer to these new
infrastructures as the SDI.
In e-science, the scientific data are complex multifaceted objects with com-
plex internal relations. They are becoming an infrastructure of their own
and need to be supported by corresponding physical or logical infrastruc-
tures to store, access, process, visualize, and manage these data.
The emerging SDI should allow different groups of researchers to work
on the same data sets, build their own (virtual) research and collaborative
environments, safely store intermediate results, and later share the discov-
ered results. New data provenance, security, and access control mechanisms
and tools should allow researchers to link their scientific results with the
initial data (sets) and intermediate data to allow future reuse/repurposing of
data (e.g., with the improved research technique and tools).
This chapter analyzes new challenges imposed on modern e-science infra-
structures by the emerging big data technologies; it proposes a general
approach and architecture solutions that constitute a new scientific data life
cycle management (SDLM) model and the generic SDI architecture model
that provides a basis for heterogeneous SDI component interoperability and
integration, in particular based on cloud infrastructure technologies.
The chapter is primarily focused on SDI; however, it provides analysis of
the nature of big data in e-science, industry, and other domains; analyses their
commonalities and differences; and discusses possible cross-fertilization
between two domains.
The chapter refers to ongoing research on defining the big data infrastruc-
ture for e-science initially presented elsewhere [3, 4] and significantly extends
it with new results and a wider scope to investigate relations between big data
technologies in e-science and industry. With a long tradition of working with
a constantly increasing volume of data, modern science can offer industry
scientific analysis methods, while industry can bring big data technologies
and tools to wider public access.
Search WWH ::




Custom Search