Database Reference
In-Depth Information
Explain some of the ethical dilemmas associated with data mining and outline possible
solutions
PURPOSES, INTENTS AND LIMITATIONS OF DATA MINING
Data mining, as explained in Chapter 1 of this text, applies statistical and logical methods to large
data sets. These methods can be used to categorize the data, or they can be used to create predictive
models. Categorizations of large sets may include grouping people into similar types of
classifications, or in identifying similar characteristics across a large number of observations.
Predictive models however, transform these descriptions into expectations upon which we can
base decisions. For example, the owner of a book-selling Web site could project how frequently
she may need to restock her supply of a given title, or the owner of a ski resort may attempt to
predict the earliest possible opening date based on projected snow arrivals and accumulations.
It is important to recognize that data mining cannot provide answers to every question, nor can we
expect that predictive models will always yield results which will in fact turn out to be the reality.
Data mining is limited to the data that has been collected. And those limitations may be many.
We must remember that the data may not be completely representative of the group of individuals
to which we would like to apply our results. The data may have been collected incorrectly, or it
may be out-of-date. There is an expression which can adequately be applied to data mining,
among many other things: GIGO, or Garbage In, Garbage Out. The quality of our data mining results
will directly depend upon the quality of our data collection and organization. Even after doing our
very best to collect high quality data, we must still remember to base decisions not only on data
mining results, but also on available resources, acceptable amounts of risk, and plain old common
sense.
DATABASE, DATA WAREHOUSE, DATA MART, DATA SET…?
In order to understand data mining, it is important to understand the nature of databases, data
collection and data organization. This is fundamental to the discipline of Data Mining, and will
directly impact the quality and reliability of all data mining activities. In this section, we will
 
 
Search WWH ::




Custom Search