While loading this data, we can use various techniques like ETL ( Extract, Trans-
form, and Load ), ELT ( Extract, Load, and Transform ), or ETLT ( Extract, Load,
Transform, and Load ).
• Extract,Transform, andLoad : It is all about transforming data against a set
of business rules before loading it into a data sandbox for analysis.
• Extract, Load, and Transform : In this case, the raw data is loaded into a
data sandbox and then transformed as a part of analysis. This option is more
relevant and recommended over ETL as a prior data transformation would
mean cleaning data upfront and can result in data condensation and loss.
• Extract, Transform, Load, and Transform : In this case, we would see two
levels of transformations:
• Level 1 transformation could include steps that involve reduction of
data noise (irrelevant data)
• Level 2 transformation is similar to what we understood in ELT
In both ELT and ETLT cases, we can gain the advantage of preserving the raw data.
One basic assumption for this process is that data would be voluminous and the re-
quirement for tools and processes would be defined on this assumption.
form to explore the nuances in data. This phase requires domain experts and data-
base specialists. Tools like Hadoop can be leveraged. We will learn more on the ex-
ploration/transformation techniques in the coming chapters.
Phase 4 - model
This phase has two important steps and can be highly iterative. The steps are:
• Model design
• Model execution
In the model designing step, we would identify the appropriate/suitable model given
a deep understanding of the requirement and data. This step involves understanding
the attributes of data and the relationships. We will consider the inputs/data and then
examine if these inputs correlate to the outcome we are trying to predict or analyze.
As we aim to capture the most relevant variables/predictors, we would need to be
vigilant for any data modeling or correlation problems. We can choose to analyze