Databases Reference
In-Depth Information
This advantage is not necessarily all one way. Each column you need to retrieve
needs to be accessed separately, whereas you can retrieve an entire row in a single read. so the
greater the amount of the information that you need from a row the less performance advantage
that a column-based approach offers. To take a simplistic example, if you want to read a single
row then that is one read. if that row has 15 columns then that is, in theory, 15 reads, so there
is a trade-off between the number of rows you want to read versus the number of columns,
together with the overhead of finding the rows/columns you need to read in the first place.
Note
A further consideration is that there is a class of query that can be answered directly
from an index. These are known as “count queries.” Let's take, for example, the question
posed previously: Count the married, employed customers who own a house. If you have
a row-based database, and you have appropriate indexes defined, then you can resolve
these queries without having to read the data at all. Of course, in the case of a column-based
database the data is the index (or vice versa) so you should always be able to answer
count queries in this way.
Note
in a big data environment, count types of queries are common.
Time-based Queries: The issue here is not so much of performance but more of whether
relevant queries are possible at all. This is because you not only need the extended SQL in
order to handle time-lapse queries but also the ability to store time-stamped transactions.
Neither of these is typically the case with traditional RDBMS data stores. Conversely, there are
a number of column-based data stores that provide exactly such an approach.
Note that there are a number of use cases that require such capabilities that go
beyond conventional databases. For example, in telecommunications it is mandated that
companies must retain call detail records, against which relevant queries can be run,
often on a time-lapsed basis. Similarly, you will want to be able to run time-based queries
against log information (from databases, system logs, web logs and so forth) as well as
e-mails and other corporate data that you may need for evidentiary reasons.
Requirements for the Next Generation
Data Warehouses
In order to provide the best possible performance to the largest number of users, data
warehouses are significantly pre-designed. While logically this may be a reflection of
the data model that underpins the data warehouse, in physical terms this means the
pre-building indexes, careful partitioning of data, parallel disk striping, developing of
pre-aggregated tables, etc.
However, from our discussions so far, we also understood that, the big data scale and
type of workloads play a significant role in database design considerations. On the basis
 
 
Search WWH ::




Custom Search