Database Reference
In-Depth Information
On the other hand, the new requirements discussed at the beginning of
this section led to the ELT paradigm, depicted in Fig. 13.7 b. Here, data are
extracted from the data sources into the staging database using any available
data connectivity tool not just specialized ETL middleware. At this staging
database, integrity and business rule checks can be applied, and relevant
corrections can be made. After this, the source data are loaded into the
warehouse, which provides a validated off-line copy of the source data in
the data warehouse. Once in the warehouse, transformations are performed
to take the data to their target output format. We can see that while ETL
transformation happens at the ETL tool, ELT transformation happens at the
database . In this way, the extraction and loading processes can be isolated
from the transformation process, allowing the user to include data that may
be needed in the future. Even the whole data source could be loaded into the
warehouse. This, combined with the isolation of the transformation process,
means that future requirements can easily be incorporated into the warehouse
structure, minimizing the risk of a project. Further, the tools provided with
the database engine can be used for this process, reducing the need to
implement and learn specialized ETL tools.
We must keep in mind that ELT is an emerging paradigm that, although
promising, still needs to be developed further. This paradigm relies, in part, in
high-speed data loading, probably using large parallel DBMSs, for example,
taking advantage of technologies like MapReduce, studied in Sect. 13.1 .
13.8 Summary
We have studied the changes that big data analytics requirements are
introducing in the data warehousing world and the answers that the academia
and the industry have devised for them. We presented the MapReduce
model and its most popular implementation, Hadoop. We also presented
two high-level query languages for Hadoop, namely, Pig Latin and HiveQL.
We also studied two database architectures that are gaining momentum
in data warehousing and business intelligence: column-store databases and
IMDBSs. We described the main characteristics of some of the database
systems based on these technologies: Vertica, MonetDB, SAP HANA,
Oracle TimesTen, and Microsoft xVelocity. Finally, we discussed two modern
paradigms increasingly used in data warehousing and business intelligence:
real-time data warehousing and ELT. Both paradigms are possible thanks to
the technologies studied in this chapter.
Search WWH ::




Custom Search