Application Architectures for Big Data and Analytics - Big Data Imperatives

Databases Reference

In-Depth Information

Table 5-1. ( continued )

Big Data Warehouse Characteristics

EDW Characteristics

Data Integrity and Standards:

•

Data integration standards are

•

Driven by relational database

loosely defined, mostly programmer

or application style driven, lack of

metadata management, business

rules and transformations an integral

part of the programs.

management systems principles

and architecture approaches (ETL,

ELT), data consistency (referential

integrities and business rules) and

availability drives major development

activities.

•

Data and data processing programs

are highly distributed.

•

Data is primarily centralized and data

processing programs follow a well-

defined execution approach, in most

cases these programs are sequential.

Data Design Principles for Big Data Solutions

The distributed nature of big data implies that the data designs must focus on

partition-tolerance, secondly to solve the scale issue the data also needs to be distributed

across many clusters and nodes hence data designs should also explicitly account for

availability. There are two methods broadly applied to address the partition-tolerance

and availability requirements:

•

Vertical Scaling

•

Horizontal Scaling

Vertical Scaling. Vertical scaling simply involves moving the application to larger

computers. This approach is also known as “scale up.” This works quite well for data

but does have limitations such as outgrowing the capacity. It can also be expensive, as

you may have to buy newer, bigger, and better machines to cope and this could lead to a

vendor lock situation.

Horizontal Scaling. This approach offers more flexibility but is far more complex

to manage and design. Horizontal scaling is done by functional scaling, which involves

organizing similar data (either through their functional alignment or if some data entities

are always queried together) groups and spreading these groups across databases.

The second approach is sharding , which involves splitting the data within the areas of

functionality across multiple databases. This approach is also known as “scale out.”

Before we delve deep into data design principles for big data solutions, you should

first understand a few established theories governing data design approaches.

ACID. ACID stands for atomicity , consistency , isolation, and durability . Following

Boyce-Codd's principles, relational database management systems adopted the ACID

approach for data design. In essence, the relational database systems ensured atomicity

(a transaction is all or nothing), consistency (only valid data is written to the database),

isolation (all transactions are happening serially and the data is correct) and durability

(what you write is what you get).

Big Data Imperatives

Search WWH ::

Custom Search

Home