material). The liquid sludge is diverted into holding tanks and referred to as the
pregnant solution—a liquid sludge containing 70% of the gold.
Like rock and ore, raw data needs to be prepared. The mecha-
nisms for refining it to enable knowledge extraction involve data
analysis technology, data cleansing, transformations, and attribute
synthesis. These are big terms for problems such as graphing data
values, correcting typos, dealing with missing values, categorizing
data values (e.g., age) into buckets instead of continuous values,
and creating new attributes based on other attributes (e.g., cus-
tomer lifetime value).
[The liquid sludge] is drawn from holding tanks through a clarifier , a device
that removes all the remaining rock or clay from a pregnant solution. In the
next step, the material is taken to a de-areator tank that removes bubbles of air and
further clarifies the solution.
The dataset as presented to the data mining algorithm can be
viewed as the “pregnant solution.” As a data mining algorithm
executes, it makes finer and more precise distinctions about the
data to extract knowledge. This can be in the form of, for example,
rules that define customer profiles, common co-occurrences of
product sales enabling cross-sell, or a representative case that
describes a set of patients susceptible to a type of cancer.
Zinc is added in dust form to the de-areated solution, which is drawn under
pressure through a filter press; which causes the gold and zinc to precipitate
onto canvas (heavy cloth) filter leaves. This zinc-gold precipitate (condensed into a
solid) is then cleaned from the filters while extreme heat burns off the zinc.
The purified “precipitate” of data mining is the emerging model,
which contains the extracted knowledge. It needs to be tested and
possibly refined through changing of parameters or further prepa-
ration of the data to produce a sufficient knowledge yield.
Water passing through the filters is chemically tested for gold residue before
being discharged into tailings ponds. Gold bearing water may be passed
through the filtering process several times to remove all of the gold and separate it
from impure substances.
Mining algorithms will often make several passes over the data
to continually tune or refine the model. Algorithms, such as neural
networks, decision trees, and K-means clustering, make multiple
passes over the data until any further improvements are deter-
mined insignificant or some other stopping criterion is met.