Pattern Recognition within Coarse-Grained Networks - Internet-Scale Pattern Recognition

Databases Reference

In-Depth Information

vertically, i.e., different records are distributed across the network. However,

there are situations where some parts of the records are stored in different

locations. For instance, a large database table is split into different sub-tables.

In addition, the operations of the MapReduce functions produce many in-

termediary entities between the map and reduce functions. These entities, as

shown in Figure 8.9, are in the form of intermediate files. The contents of

these files must be sorted before being fed to the reduce function. These sys-

tem sorts and redistributions incur additional processing and communication

costs.

In view of these issues, a data access scheme that enables retrieval to be

conducted across multiple records and data segments in a single-cycle and par-

allel approach is considered. The access mechanism is implemented according

to the nature of the database. The retrieval process will be conducted on a

set of records that reside in a particular node. No alterations will be made

to the condition of the record itself. A parallel retrieval approach is used, in

which records in each storage node are analyzed locally without incurring any

communication costs. A distributed pattern matching/recognition approach,

such as the DHGN, can be used to retrieve data from the cloud.

8.3.2 DHGN Approach for Cloud Data Access

Through the redesign of the data management architecture, data records

are treated as patterns. This treatment enables data storage and retrieval

by association over and above the existing simple data referential mecha-

nisms. Processing the database and handling the dynamic load is performed

through a distributed pattern recognition approach that is implemented in

integrated and loosely coupled computational networks and is followed by

a divide-and-distribute approach that allows for the dynamic distribution of

these networks within the cloud. The DHGN cloud access scheme relies on

communications between adjacent nodes. The decentralized content location

schemes are implemented to discover the adjacent nodes in a minimal number

of hops. A GN-based algorithm for optimally distributing the DHGN subnets

(clusters or sub-domains) across the cloud nodes is provided to automate the

boot-strapping of the distributed application and to investigate dynamic load

balancing over the network. Figure 8.10 shows how DHGN subnets are po-

sitioned in the cloud environment using a Hadoop's distributed file system

(DFS) architecture.

Note that the DHGN subnets perform data mapping on each of the data

nodes within the DFS infrastructure. Within each DHGN subnet, the records

are stored in an associative pattern; each DHGN neuron corresponds to a

single data field. The mapping process occurs within the body of the DHGN

subnet. The SQL condition will activate the neuron that holds the respective

data field. Figure 8.11 shows the data representation in the DHGN data access

cloud scheme.

Search WWH ::

Custom Search

Home