Database Reference
In-Depth Information
Figure 10.1 Visualizing Thomas Kejser's Big Picture Data Warehouse
Architecture
Code First (Schema Later)
This code-first approach does have other advantages. Cleansing operations,
for example, “get in the way” of mining the data for valuable nuggets of
information. There is also reduced risk in using code first when you are
keeping all the data in its raw form. You can always augment your program,
redeploy, and restart the analysis because you still have all the data at your
disposal in its raw form.
You can't always go back if you have observed a schema-first approach.
If you have modeled the data to follow one structure you may have
transformed or aggregated the source information making it impossible to
go back to the beginning and reload from scratch. Worse still, the source
data may have been thrown away, which could have been collected years
ago, it might be difficult or in some cases impossible to get that data back.
Consequently, Hadoop developers tend to shy away from a schema-first/
code-later design. They worry less about the structure of the data and
instead focus on looking for the value in the data. Remember, Hadoop
 
Search WWH ::




Custom Search