Database Reference
In-Depth Information
the rule itself is possible, thus it is assigned an RF value of 1; otherwise RF = 0.
However, this measure is not certain since, for instance, an adversary may not
learn an itemset despite knowing its subsets.
Bertino et al. [ 12 ] propose a set of measures that are directly related to the per-
formance of a hiding algorithm as far as external parameters are concerned. These
process performance measures are clustered into four categories, as follows:
(a) Efficiency. This category consists of measures that quantify the ability of a
privacy preserving algorithm to efficiently use the available resources and execute
with good performance. Efficiency is measured in terms of CPU-time, space
requirements (related to the memory usage and the required storage capacity)
and communication requirements.
(b) Scalability. This category consists of measures that evaluate how effectively the
privacy preserving technique handles increasing sizes of the data from which
information needs to be mined and privacy needs to be ensured. Scalability
is measured based on the decrease in the performance of the algorithm or the
increase of the storage requirements along with the communications cost (if in
a distributed setting), when the algorithm is provided with larger datasets.
(c) Data Quality. The data quality of a privacy preservation algorithm depends on
two parameters. There are the quality of the dataset after the sanitization process,
and the quality of the data mining results when applied to this dataset, compared
to the ones attained when using the original dataset. Among the various possible
measures for the quantification of the data quality, the most preferable are: (i)
accuracy , which measures the proximity of a sanitized value to the original one
and is closely related to the information loss resulting from the hiding strategy,
(ii) completeness , which is used to evaluate the degree of missed data in the
sanitized database and (iii) consistency , which is related to the relationships
that must continue to hold among the different fields of a data item or among
data items in a sanitized database. Examples of data quality measures are Diss
(presented earlier) and Kullback-Leibler (KL) divergence.
(d) Privacy Level. This category consists of measures that estimate the degree of
uncertainty according to which, the protected information can still be predicted.
Measures, such as the information entropy, the level of privacy and the J -measure
[ 12 ], are some among the possible metrics that one can apply to quantify the
privacy level attained by a hiding scheme.
4
Cryptographic Methods
Over the years, many data mining protocols have been designed to mine distributed
data that reside in different data warehouses. In those protocols, data are generally
assumed to be either vertically or horizontally partitioned. Table 15.1 shows a trivial
example of two different data partitioning schemes for a simple transaction (binary)
dataset U , consisting of four attributes.
Search WWH ::




Custom Search