Database Reference
In-Depth Information
Chapter 5. Solving a Problem You
Didn't Know You Had
Whenever you build a system, it's good practice to do testing before you begin using it, espe-
cially before it goes into production. If your system is designed to store huge amounts of
time series data—such as two years' worth of sensor data—for critical operations or analysis,
it's particularly important to test it. The failure of a monitoring system for drilling or pump
equipment on an oil rig, for manufacturing equipment, medical equipment, or an airplane,
can have dire consequences in terms financial loss and physical damage, so it is essential that
your time series data storage engine is not only high performance, but also robust. Some-
times people do advance testing on a small data sample, but tests at this small scale are not
necessarily reliable predictors of how your system will function at scale. For serious work,
you want a serious test, using full-scale data. But how can you do that?
The Need for Rapid Loading of Test Data
Perhaps you have preexisting data for a long time range that could be used for testing, and at
least you can fairly easily build a program to generate synthetic data to simulate your two
years of information. Either way, now you're faced with a problem you may not have real-
ized you have: if your system design was already pushing the limits on data ingestion to
handle the high-velocity data expected in production, how will you deal with loading two
years' worth of such data in a reasonable time? If you don't want to have to wait two years to
perform the test, you must either give up having a full-scale test by downsampling or you
must find a clever way to speed up test data ingestion rates enormously compared with nor-
mal production rates. For this example, to ingest two years of data in a day or two, you will
need to ingest test data 100-1,000 times faster than your production rate. Even if your pro-
duction data ingestion rate is only moderately high, your test data ingestion rate is liable to
need to be outrageous. We choose the option to speed up the ingestion rate for test data.
That's where the open source code extensions developed by MapR (described in Chapter 4 )
come to the rescue.
Search WWH ::




Custom Search