Database Reference
In-Depth Information
Stay organized
The most common mistake beginners make when building an analytics system is not being orga-
nized. Preparing your data for analytics should be divided into multiple stages. Each stage performs a
special purpose in the data preparation process leading up to the final output.
To illustrate the concept, consider the example of a restaurant. You would not put the kitchen equip-
ment where the food preparation occurs in the dining room where customers eat, would you? You
want your customers to enjoy their meal without the clutter of the kitchen operations. Inside the
kitchen you would separate the area by function to improve efficiency. The same concept is true for
your analytics system.
You want to build an analytics system that is easy to reuse. Typically that involves build-
ing the data structure and updating the data within it on a regular basis. The data inside
that structure can be used to build reoccurring reports or to run ad hoc queries. Once
built, these systems are not completely static in terms of their structure and must con-
tinuously evolve to keep up with the business changes.
Note
We recommend separating the tables and scripts into various stages and delineating those stages
using a specific naming convention. For a simple system, organize your system in three stages:
Data extraction: This stage includes the tables, files, and scripts that are involved in extract-
ing data from the various source systems. Keep this stage simple and light on business logic.
For most of your extract, you can rely on a change data capture (CDC) date that outlines the
changed data in your system. Otherwise you always have to pull all source data upon every
extract, which makes your system slow and degrades performance over time as data sizes
grow.
Data preparation: This stage includes the tables and scripts that are involved in preparing
the data to be inserted in the final output. Most of the system's business logic is in this stage.
The final step in this stage is to prepare the incremental data set to be loaded into each table
in the final output. So, you should have a mirror table on each Final Output table. Think of
this as plating the dishes and setting them out to be picked up by the waiters and delivered
to the customers. It may seem like a redundant step to stage your data in a mirror table to
your final destination, but there are good reasons for doing so, including data validation,
data integrity, consistency, and reusability.
Data delivery: This stage includes the tables and scripts that are involved in the final deliv-
ery of the data in the analytics system. The delivery steps typically involve updating existing
records and inserting new ones. If you completely reload your system every time, you would
obviously not have any records to update. We recommend you follow an incremental update
strategy to maintain system scalability.
Complex systems require additional categories and phases and more sophisticated processes com-
pared to what we have just outlined. As you feel more comfortable with your analytics system, you
can expand on this foundation.
Following a logical organization when building your system can make a big difference in the long run
and help keep it maintainable and efficient.
 
Search WWH ::




Custom Search