Database Reference
In-Depth Information
proposed a framework called BIO-AJAX to standardize biological data so as to
conduct further computation and improve search quality. With BIO-AJAX, some
errors and repetitions may be eliminated, and common data mining technologies
can be executed more effectively.
3.2.3.3
Redundancy Elimination
Data redundancy refers to data repetitions or surplus, which usually occurs in many
datasets. Data redundancy can increase the unnecessary data transmission expense
and cause defects on storage systems, e.g., waste of storage space, leading to data
inconsistency, reduction of data reliability, and data damage. Therefore, various
redundancy reduction methods have been proposed, such as redundancy detection,
data filtering, and data compression. Such methods may apply to different datasets
or application environments. However, redundancy reduction may also bring about
certain negative effects. For example, data compression and decompression cause
additional computational burden. Therefore, the benefits of redundancy reduction
and the cost should be carefully balanced.
Data collected from different fields will increasingly appear in image or video
formats. It is well-known that images and videos contain considerable redundancy,
including temporal redundancy, spacial redundancy, statistical redundancy, and
sensing redundancy. Video compression is widely used to reduce redundancy in
video data, as specified in the many video coding standards (MPEG-2, MPEG-4,
H.263, and H.264/AVC). In [ 47 ], the authors investigated the problem of video
compression in a video surveillance system with a video sensor network. The
authors propose a new MPEG-4 based method by investigating the contextual
redundancy related to background and foreground in a scene. The low complexity
and the low compression ratio of the proposed approach were demonstrated by the
evaluation results.
On generalized data transmission or storage, repeated data deletion is a special
data compression technology, which aims to eliminate repeated data copies [ 48 ].
With repeated data deletion, individual data blocks or data segments will be assigned
with identifiers (e.g., using a hash algorithm) and stored, with the identifiers added
to the identification list. As the analysis of repeated data deletion continues, if a
new data block has an identifier that is identical to that listed in the identification
list, the new data block will be deemed as redundant and will be replaced by
the corresponding stored data block. Repeated data deletion can greatly reduce
storage requirement, which is particularly important to a big data storage system.
Apart from the aforementioned data pre-processing methods, specific data objects
shall go through some other operations such as feature extraction. Such operation
plays an important role in multimedia search and DNA analysis [ 49 - 51 ]. Usually
high-dimensional feature vectors (or high-dimensional feature points) are used to
describe such data objects and the system stores the dimensional feature vectors for
future retrieval. Data transfer is usually used to process distributed heterogeneous
data sources, especially business datasets [ 52 ].
Search WWH ::




Custom Search