Database Reference
In-Depth Information
latter date format is adequate for the type of data mining being performed, it would make sense to
simplify the attribute containing dates and times when we create our data set. Data sets may be
made up of a representative sample of a larger set of data, or they may contain all observations
relevant to a specific group. We will discuss sampling methods and practices in Chapter 3.
TYPES OF DATA
Thus far in this text, you've read about some fundamental aspects of data which are critical to the
discipline of data mining. But we haven't spent much time discussing where that data are going to
come from. In essence, there are really two types of data that can be mined: operational and
organizational .
The most elemental type of data, operational data, comes from transactional systems which record
everyday activities. Simple encounters like buying gasoline, making an online purchase, or
checking in for a flight at the airport all result in the creation of operational data . The times,
prices and descriptions of the goods or services we have purchased are all recorded. This
information can be combined in a data warehouse or may be extracted directly into a data set from
the OLTP system.
Often times, transactional data is too detailed to be of much use, or the detail may compromise
individuals' privacy. In many instances, government, academic or not-for-profit organizations may
create data sets and then make them available to the public. For example, if we wanted to identify
regions of the United States which are historically at high risk for influenza, it would be difficult to
obtain permission and to collect doctor visit records nationwide and compile this information into
a meaningful data set. However, the U.S. Centers for Disease Control and Prevention (CDCP), do
exactly that every year. Government agencies do not always make this information immediately
available to the general public, but it often can be requested. Other organizations create such
summary data as well. The grocery store mentioned at the beginning of this chapter wouldn't
necessarily want to analyze records of individual cans of greens beans sold, but they may want to
watch trends for daily, weekly or perhaps monthly totals. Organizational data sets can help to
protect peoples' privacy , while still proving useful to data miners watching for trends in a given
population.
 
Search WWH ::




Custom Search