Creating and Managing SQL Scripts - Microsoft Business Intelligence Tools for Excel Analysts

Database Reference

In-Depth Information

Stay organized

The most common mistake beginners make when building an analytics system is not being orga-

nized. Preparing your data for analytics should be divided into multiple stages. Each stage performs a

special purpose in the data preparation process leading up to the final output.

To illustrate the concept, consider the example of a restaurant. You would not put the kitchen equip-

ment where the food preparation occurs in the dining room where customers eat, would you? You

want your customers to enjoy their meal without the clutter of the kitchen operations. Inside the

kitchen you would separate the area by function to improve efficiency. The same concept is true for

your analytics system.

You want to build an analytics system that is easy to reuse. Typically that involves build-

ing the data structure and updating the data within it on a regular basis. The data inside

that structure can be used to build reoccurring reports or to run ad hoc queries. Once

built, these systems are not completely static in terms of their structure and must con-

tinuously evolve to keep up with the business changes.

Note

We recommend separating the tables and scripts into various stages and delineating those stages

using a specific naming convention. For a simple system, organize your system in three stages:

➤ Data extraction: This stage includes the tables, files, and scripts that are involved in extract-

ing data from the various source systems. Keep this stage simple and light on business logic.

For most of your extract, you can rely on a change data capture (CDC) date that outlines the

changed data in your system. Otherwise you always have to pull all source data upon every

extract, which makes your system slow and degrades performance over time as data sizes

grow.

➤ Data preparation: This stage includes the tables and scripts that are involved in preparing

the data to be inserted in the final output. Most of the system's business logic is in this stage.

The final step in this stage is to prepare the incremental data set to be loaded into each table

in the final output. So, you should have a mirror table on each Final Output table. Think of

this as plating the dishes and setting them out to be picked up by the waiters and delivered

to the customers. It may seem like a redundant step to stage your data in a mirror table to

your final destination, but there are good reasons for doing so, including data validation,

data integrity, consistency, and reusability.

➤ Data delivery: This stage includes the tables and scripts that are involved in the final deliv-

ery of the data in the analytics system. The delivery steps typically involve updating existing

records and inserting new ones. If you completely reload your system every time, you would

obviously not have any records to update. We recommend you follow an incremental update

strategy to maintain system scalability.

Complex systems require additional categories and phases and more sophisticated processes com-

pared to what we have just outlined. As you feel more comfortable with your analytics system, you

can expand on this foundation.

Following a logical organization when building your system can make a big difference in the long run

and help keep it maintainable and efficient.

Search WWH ::

Custom Search

Home