Databases Reference
In-Depth Information
process that store and communicate data must take into account the
size of the data sets used.
2. Dimensions of data: The sophisticated approaches in data capture tech-
nology allow for highly dimensional data to be extracted from the envi-
ronment. In this context, pattern recognition applications must be able
to cater to different dimensionalities of data in their implementations.
3. Algorithmic complexity: Existing pattern recognition schemes are pow-
erful and have the ability to provide highly accurate solutions. Neverthe-
less, they incur high algorithmic complexity in their implementations,
which is attributed to the iterative nature and complex mathematical
foundations of the algorithms. Some algorithms are exponential and in-
feasible for large-scale data. Furthermore, the expensive computations
of existing pattern recognition schemes can be computationally time-
consuming, especially when processing complex large-scale data.
These barriers are the common factors in determining the scalability of a par-
ticular pattern recognition approach. Each approach must be able to address
increasing size and dimensionality of the data, while minimizing its complexity.
In this regard, scalability evaluations of existing pattern recognition schemes
are valuable to most pattern recognition application developers.
1.4.2 Possible Solutions
Scalability is an important factor in today's pattern recognition approaches.
The existing outgrowth of data in daily usage shows that the capability of ex-
isting algorithms must continue to grow to serve these Internet-scale data. For
example, according to Anderson [14], every 72 minutes there is one petabyte
of data processed by Google's server. This value will continue to increase as
the storage and processing mechanisms advance. The question of scalability
as described by Pal and Mitra [15] is as follows: Can the pattern recognition
algorithm process large data sets e ciently, while building from them the best
possible models?
There are several techniques to scale up pattern recognition algorithms
for large-scale data sets. These techniques can be divided into a number of
approaches:
1. Data Approach: This type of technique modifies the data prior to the
recognition process. Some of the techniques are data reduction, dimen-
sionality reduction, and data partitioning. The aim of this approach is
to minimize the size and dimensionality of the data for e cient recog-
nition. However, this approach may undermine the data integrity by
representing the large data domain using a small data set.
2. Learning Approach: Pattern recognition algorithms require a learning
mechanism. This mechanism may be computationally expensive. There-
Search WWH ::




Custom Search