Database Reference
In-Depth Information
Move data in one direction
The data coming through your analytics system should move from the source systems on down the
various phases that you have outlined (see Figure 10-1).
Figure 10-1: Move data in one direction.
Following this principle allows you to ensure data integrity every time you run your script and makes
maintenance and troubleshooting easier.
You should also set up your script to run all the components of every phase together. This keeps your
design simple and allows you to avoid building tables with incomplete data. Think of each circle in
Figure 10-1 as a gateway or checkpoint. The data stops after each of those points and waits for the
rest of the data points in that phase to come in. After that is done, you move on to the next phase.
Divide data according to metrics and attributes
When you first build an analytics system, you may be tempted to organize all your data into one large
table. At first glance, this may seem like the way to go, but as you get deeper into the process you'll
find that this approach has two main limitations:
Bad performance: A large table approach leads to bad performance. This is due to several
reasons, including unnecessarily repeating data, having to redundantly update that data
when changes occur, and increasing the size of data on disk. Performance problems may not
show up on day one, but they will creep up as your data volumes increase.
Lack of flexibility: Advanced analytics often require multiple metrics that are not always at
the same granularity; for example, order count versus order item count. When you lump all
the metrics together, you limit your analytics capabilities because you are artificially fitting
metrics into one structure.
A more optimal approach is to organize your metrics and attributes separately according to granularity
and relationships. Here are two high-level guidelines to follow:
Separate attributes from metrics. Attributes are generally strings such as names or descrip-
tions. There is no need for them to be lumped in with metrics. You can put them in their own
tables and reference a key relationship to the table that contains your metrics. Figure 10-2 con-
tains a flat reporting table. It's a single table that has the product information, the customer
details, and the orders. Figure 10-3 organizes the data into three tables according to relation-
ships. Consider that the Product Name has a misspelling in it and needs to be corrected.
 
Search WWH ::




Custom Search