New Data Warehouse Technologies - Data Warehouse Systems: Design and Implementation

Database Reference

In-Depth Information

be applied. For example, we can drop the column-store index; perform the

required INSERT , DELETE ,or UPDATE operations; and then rebuild the

column-store index. Of course, building the index on large tables can be

costly, and if this procedure has to be followed on a regular basis, it may not

be plausible. As another option, we can allocate data identified as static (or

rarely changing) into a main table with a column-store index defined over it.

Recent data, which are likely to change, can be stored into a separate table

with the same schema but which does not have a column-store index defined.

Then, we can apply the updates. Note that this requires rewriting a query

as two queries, one against each table, and then combining the two result

sets with UNION ALL . The updating technique above shows one of the trade-

offs of having column storage as an index in a row-oriented database: the ad

hoc updating procedures described are performed automatically in most of

the other products we described in this chapter. On the other hand, those

products are normally not appropriate for heavy transactional workloads.

13.6 Real-Time Data Warehouses

Many current data warehousing applications must handle large volumes of

concurrent requests while maintaining adequate query response time and

must scale up as the data volume and number of users grow. This is

quite different from the early days of data warehousing, when just a few

number of users accessed the data warehouse. Moreover, most of these

applications need to remain continuously available, without a refreshing

time window. These applications require a new approach to the extraction,

transformation, and loading (ETL) process studied in Chap. 8 . Recall that

ETL processes periodically pull data from source systems to refresh the data

warehouse. This process is acceptable for many real-world data warehousing

applications. However, the new database technologies studied in this chapter

make nowadays possible to achieve real-time data warehouses, where there

are continuous data warehouse feeds from production systems, and at the

same time obtain consistent, reliable data analysis results.

As studied in this topic, the life cycle of a data record in a business

intelligence environment starts with a business event taking place. ETL

processes then deliver the event record to the data warehouse. Finally,

analytical processing turns the data into information to help the decision-

making process, and a business decision leads to a corresponding action. To

approach real time, the time elapsed between the event and its consequent

action, called the data latency , needs to be minimized. Making rapid

decisions based on large volumes of data requires achieving low data latency,

sometimes at the expense of potential data inconsistency (e.g., late and/or

missing data) and specialized hardware. In the general case, it is the data

acquisition process that introduces most of the data latency.

Search WWH ::

Custom Search

Home