Data Warehouses and Hadoop Integration - Microsoft Big Data Solutions

Database Reference

In-Depth Information

Figure 10.1 Visualizing Thomas Kejser's Big Picture Data Warehouse

Architecture

Code First (Schema Later)

This code-first approach does have other advantages. Cleansing operations,

for example, “get in the way” of mining the data for valuable nuggets of

information. There is also reduced risk in using code first when you are

keeping all the data in its raw form. You can always augment your program,

redeploy, and restart the analysis because you still have all the data at your

disposal in its raw form.

You can't always go back if you have observed a schema-first approach.

If you have modeled the data to follow one structure you may have

transformed or aggregated the source information making it impossible to

go back to the beginning and reload from scratch. Worse still, the source

data may have been thrown away, which could have been collected years

ago, it might be difficult or in some cases impossible to get that data back.

Consequently, Hadoop developers tend to shy away from a schema-first/

code-later design. They worry less about the structure of the data and

instead focus on looking for the value in the data. Remember, Hadoop

Search WWH ::

Custom Search

Home