Database Reference
In-Depth Information
A third important factor influencing certainty is significance . Significance
indicates whether a discovered result is based on coincidence. For instance, when
a coin is thrown a hundred times, it may be expected that heads and tails will each
occur fifty times. If a 49-51 ratio were to be found, this may be considered a
coincidence, but if a 30-70 ratio were found, it may be difficult to assume this is
coincidental. The latter result is significantly different from what is expected.
With the help of confidence intervals (see below), it is possible to determine the
likelihood of whether a discovered result may be considered a coincidence or not.
Once the certainty of particular knowledge has been determined using a chosen
mathematical method, it is up to the user to decide whether that certainty is
sufficient for further use of that knowledge. The standard technique for calculating
certainty in the case of regression techniques is the calculation of the standard
error . The standard error indicates to what extent the data differs from the
regression function determined. The larger the value of the standard error, the
larger the spreading of the data. Using standard errors, it is possible to calculate
confidence intervals . A confidence interval is a range of values with a given
chance of containing the real value. In most cases, the user's confidence interval is
chosen in such a way that confidence is fixed at 95 or 99 per cent.
Finally, it should be mentioned that for profiles, certainty is closely related to
reliability. The reliability of a profile may be split into (a) the reliability of the
profile itself, which comprises certainty, and (b) the reliability of the use of the
profile. This distinction is made because a particular profile may be entirely
correct from a technological perspective, but may still be applied incorrectly. For
instance, when data mining reveals that 80 % of all motels are next to highways,
this may be a result with a particular certainty. When all motels were counted, the
certainty of this pattern is 100 %, but when a sample of 300 motels were taken in
consideration, of which 240 turned out to lie next to highways, the certainty may
be less because of the extrapolation. However, if a motel closes or a new motel
opens, the reliability of the pattern decreases, because the pattern is based on data
that are no longer up to date, yielding a pattern that represents reality with less
reliability. The reliability of the use of a particular profile is yet another notion.
Suppose a particular neighborhood has an unemployment rate of 80 %. When a
local government addresses all people in this neighborhood with a letter regarding
unemployment benefits, their use of the profile is not 100 % reliable, as they also
address people who are employed.
1.2.3 Profiles of Individuals and of Groups
Profiling is the process of creating profiles. Although profiles can be made of
many things, such as countries, companies or processes, in this topic we focus on
profiles of people or groups of people. Hence, we consider a profile a property or
a collection of properties of an individual or a group of people. Several names
exist for these profiles. Personal profiles are also referred to as individual profiles
or customer profiles, while group profiles are also referred to as aggregated
profiles. Others use the terms abstract profiles and specific profiles for group
Search WWH ::




Custom Search