New Technologies Applied to Data Warehousing - Data Warehousing in the Age of Big Data

Databases Reference

In-Depth Information

The distinct advantages of this architecture approach include:

●

All the data is stored locally across the nodes, and each node manages its portion of the data and

query assigned to it by the master node.

●

Data is striped and mirrored across two nodes at a minimum, which increases scalability when

large query workloads are submitted.

●

The advantage of data being mirrored is that it helps in achieving workload balance and can

support failure in case of an unplanned outage.

●

When the need arises for scalability the nodes can be used to divide the work in discrete chunks,

and if needed, we can simply add more nodes that can be configured and used by the system with

minimal intervention.

●

A node can be assigned a specific role or set of roles to be available for querying, loading, and

managing data.

The appliance architecture in a nutshell is a specialized configuration of multiple SMP nodes into

one physical device with a custom operating system layer added to a Linux or Unix platform, which

is managed by a smart controller and has its own internal network switch to move large data across

the nodes, bypassing the outside network completely. Due to its self-managing nature, administra-

tors or database administrators (DBAs) find minimal needs for intervention to maintain sustained

performance and scalability. Appliances also provide the flexibility to deploy commodity hardware

platforms, which lower the cost of operation and can increase time to market. The lower price point

enables appliance users to add more nodes as needed without breaking the bank.

The other aspect of the appliance that is worth exploring and understanding before you launch on

selecting an appliance or migrating to an appliance is the data architecture. The appliance can support

third normal form (3NF), star schema, or hybrid data architecture depending on the user's needs. The

data distribution and data storage techniques create the magic of scalability with workloads and users,

which we discuss next.

Data distribution in the appliance

Figure 9.2 shows a typical data distribution across the data warehouse appliance. From this figure we

see that data is distributed across multiple nodes, and in addition to this, typically nodes 1, 3, and 5

will mirror data slices, nodes 2, 4, and 6 will mirror data slices across, and nodes 7 and 8 are standby

for usage if there is an outage with the other nodes.

This type of data layout definitely needs the designer or architect to:

●

Understand the data and the special requirements for handling data.

●

Understand the underlying relationships.

●

Understand the data skew.

●

Understand the data volume.

●

Understand the data growth.

Once you have the data architecture mapped, the distribution of data, including the striping and

mirroring, will create the boost needed for performance, which comes with data availability in more

than one storage location, the optimization of the workload to execute on noncompeting infrastruc-

ture, and the minimal amount of data movement within the infrastructure.

Data Warehousing in the Age of Big Data

Search WWH ::

Custom Search

Home