Database Reference
In-Depth Information
be applied. For example, we can drop the column-store index; perform the
required INSERT , DELETE ,or UPDATE operations; and then rebuild the
column-store index. Of course, building the index on large tables can be
costly, and if this procedure has to be followed on a regular basis, it may not
be plausible. As another option, we can allocate data identified as static (or
rarely changing) into a main table with a column-store index defined over it.
Recent data, which are likely to change, can be stored into a separate table
with the same schema but which does not have a column-store index defined.
Then, we can apply the updates. Note that this requires rewriting a query
as two queries, one against each table, and then combining the two result
sets with UNION ALL . The updating technique above shows one of the trade-
offs of having column storage as an index in a row-oriented database: the ad
hoc updating procedures described are performed automatically in most of
the other products we described in this chapter. On the other hand, those
products are normally not appropriate for heavy transactional workloads.
13.6 Real-Time Data Warehouses
Many current data warehousing applications must handle large volumes of
concurrent requests while maintaining adequate query response time and
must scale up as the data volume and number of users grow. This is
quite different from the early days of data warehousing, when just a few
number of users accessed the data warehouse. Moreover, most of these
applications need to remain continuously available, without a refreshing
time window. These applications require a new approach to the extraction,
transformation, and loading (ETL) process studied in Chap. 8 . Recall that
ETL processes periodically pull data from source systems to refresh the data
warehouse. This process is acceptable for many real-world data warehousing
applications. However, the new database technologies studied in this chapter
make nowadays possible to achieve real-time data warehouses, where there
are continuous data warehouse feeds from production systems, and at the
same time obtain consistent, reliable data analysis results.
As studied in this topic, the life cycle of a data record in a business
intelligence environment starts with a business event taking place. ETL
processes then deliver the event record to the data warehouse. Finally,
analytical processing turns the data into information to help the decision-
making process, and a business decision leads to a corresponding action. To
approach real time, the time elapsed between the event and its consequent
action, called the data latency , needs to be minimized. Making rapid
decisions based on large volumes of data requires achieving low data latency,
sometimes at the expense of potential data inconsistency (e.g., late and/or
missing data) and specialized hardware. In the general case, it is the data
acquisition process that introduces most of the data latency.
Search WWH ::




Custom Search