Databases Reference
In-Depth Information
vertically, i.e., different records are distributed across the network. However,
there are situations where some parts of the records are stored in different
locations. For instance, a large database table is split into different sub-tables.
In addition, the operations of the MapReduce functions produce many in-
termediary entities between the map and reduce functions. These entities, as
shown in Figure 8.9, are in the form of intermediate files. The contents of
these files must be sorted before being fed to the reduce function. These sys-
tem sorts and redistributions incur additional processing and communication
costs.
In view of these issues, a data access scheme that enables retrieval to be
conducted across multiple records and data segments in a single-cycle and par-
allel approach is considered. The access mechanism is implemented according
to the nature of the database. The retrieval process will be conducted on a
set of records that reside in a particular node. No alterations will be made
to the condition of the record itself. A parallel retrieval approach is used, in
which records in each storage node are analyzed locally without incurring any
communication costs. A distributed pattern matching/recognition approach,
such as the DHGN, can be used to retrieve data from the cloud.
8.3.2 DHGN Approach for Cloud Data Access
Through the redesign of the data management architecture, data records
are treated as patterns. This treatment enables data storage and retrieval
by association over and above the existing simple data referential mecha-
nisms. Processing the database and handling the dynamic load is performed
through a distributed pattern recognition approach that is implemented in
integrated and loosely coupled computational networks and is followed by
a divide-and-distribute approach that allows for the dynamic distribution of
these networks within the cloud. The DHGN cloud access scheme relies on
communications between adjacent nodes. The decentralized content location
schemes are implemented to discover the adjacent nodes in a minimal number
of hops. A GN-based algorithm for optimally distributing the DHGN subnets
(clusters or sub-domains) across the cloud nodes is provided to automate the
boot-strapping of the distributed application and to investigate dynamic load
balancing over the network. Figure 8.10 shows how DHGN subnets are po-
sitioned in the cloud environment using a Hadoop's distributed file system
(DFS) architecture.
Note that the DHGN subnets perform data mapping on each of the data
nodes within the DFS infrastructure. Within each DHGN subnet, the records
are stored in an associative pattern; each DHGN neuron corresponds to a
single data field. The mapping process occurs within the body of the DHGN
subnet. The SQL condition will activate the neuron that holds the respective
data field. Figure 8.11 shows the data representation in the DHGN data access
cloud scheme.
Search WWH ::




Custom Search