Databases Reference
In-Depth Information
Table 5-1. ( continued )
Big Data Warehouse Characteristics
EDW Characteristics
Data Integrity and Standards:
Data Integrity and Standards:
Data integration standards are
Driven by relational database
loosely defined, mostly programmer
or application style driven, lack of
metadata management, business
rules and transformations an integral
part of the programs.
management systems principles
and architecture approaches (ETL,
ELT), data consistency (referential
integrities and business rules) and
availability drives major development
activities.
Data and data processing programs
are highly distributed.
Data is primarily centralized and data
processing programs follow a well-
defined execution approach, in most
cases these programs are sequential.
Data Design Principles for Big Data Solutions
The distributed nature of big data implies that the data designs must focus on
partition-tolerance, secondly to solve the scale issue the data also needs to be distributed
across many clusters and nodes hence data designs should also explicitly account for
availability. There are two methods broadly applied to address the partition-tolerance
and availability requirements:
Vertical Scaling
Horizontal Scaling
Vertical Scaling. Vertical scaling simply involves moving the application to larger
computers. This approach is also known as “scale up.” This works quite well for data
but does have limitations such as outgrowing the capacity. It can also be expensive, as
you may have to buy newer, bigger, and better machines to cope and this could lead to a
vendor lock situation.
Horizontal Scaling. This approach offers more flexibility but is far more complex
to manage and design. Horizontal scaling is done by functional scaling, which involves
organizing similar data (either through their functional alignment or if some data entities
are always queried together) groups and spreading these groups across databases.
The second approach is sharding , which involves splitting the data within the areas of
functionality across multiple databases. This approach is also known as “scale out.”
Before we delve deep into data design principles for big data solutions, you should
first understand a few established theories governing data design approaches.
ACID. ACID stands for atomicity , consistency , isolation, and durability . Following
Boyce-Codd's principles, relational database management systems adopted the ACID
approach for data design. In essence, the relational database systems ensured atomicity
(a transaction is all or nothing), consistency (only valid data is written to the database),
isolation (all transactions are happening serially and the data is correct) and durability
(what you write is what you get).
 
 
Search WWH ::




Custom Search