Database Reference

In-Depth Information

While loading this data, we can use various techniques like
ETL
(
Extract, Trans-

form, and Load
),
ELT
(
Extract, Load, and Transform
), or
ETLT
(
Extract, Load,

Transform, and Load
).

•
Extract,Transform, andLoad
: It is all about transforming data against a set

of business rules before loading it into a data sandbox for analysis.

•
Extract, Load, and Transform
: In this case, the raw data is loaded into a

data sandbox and then transformed as a part of analysis. This option is more

relevant and recommended over ETL as a prior data transformation would

mean cleaning data upfront and can result in data condensation and loss.

•
Extract, Transform, Load, and Transform
: In this case, we would see two

levels of transformations:

• Level 1 transformation could include steps that involve reduction of

data noise (irrelevant data)

• Level 2 transformation is similar to what we understood in ELT

In both ELT and ETLT cases, we can gain the advantage of preserving the raw data.

One basic assumption for this process is that data would be voluminous and the re-

quirement for tools and processes would be defined on this assumption.

Theideaistohaveaccesstocleandatainthedatabasetoanalyzedatainitsoriginal

form to explore the nuances in data. This phase requires domain experts and data-

base specialists. Tools like Hadoop can be leveraged. We will learn more on the ex-

ploration/transformation techniques in the coming chapters.

Phase 4 - model

This phase has two important steps and can be highly iterative. The steps are:

• Model design

• Model execution

In the model designing step, we would identify the appropriate/suitable model given

a deep understanding of the requirement and data. This step involves understanding

the attributes of data and the relationships. We will consider the inputs/data and then

examine if these inputs correlate to the outcome we are trying to predict or analyze.

As we aim to capture the most relevant variables/predictors, we would need to be

vigilant for any data modeling or correlation problems. We can choose to analyze

Search WWH ::

Custom Search