Database Reference
In-Depth Information
the two companies. For example, by mining Wal-Mart's customer sales data, P&G
helped Wal-Mart to focus on selling items that were preferred by its customers and
eliminate other products that were rarely sold. By 1990 the two companies had
achieved to significantly improve their common business towards their mutual prof-
itability. The excessive amount of customer data owned by Wal-Mart paved the way
to the establishment of more business collaborations of the company with external
partners in the following years. Based on its business strategy, Wal-Mart shared (or
sometimes sold) portions of its customer sales data with (to) large market-research
companies. This action, however, was latter on proved to be a bad idea. As pointed
out in [35], the business managers of Wal-Mart soon found out that the dissemina-
tion of customer sales data to their partners was sometimes leading to a leakage of
business strategic knowledge to business competitors. Indeed, the strategic sales in-
formation of Wal-Mart was found in several cases to be used for the preparation of
industry-wide reports that were broadly disseminated, even to Wal-Mart's business
competitors. The disclosure of business trade secrets, shortly led Wal-Mart decide
that the sharing of its data was harming its own business.
As it becomes evident from the previous discussion, there exist an extended set of
application scenarios in which collected data or knowledge patterns extracted from
the data have to be shared with other (possibly untrusted) entities to serve owner-
specific or organization-specific purposes. The sharing of data and/or knowledge
may come at a cost to privacy, primarily due to two main reasons: (i) if the data refers
to individuals (e.g., as in detailed patient-level clinical data derived from electronic
medical records or customers' market basket data collected from a supermarket)
then its disclosure can violate the privacy of the individuals who are recorded in the
data if their identity is revealed to untrusted third parties or if sensitive knowledge
about them can be mined from the data, and (ii) if the data regards business (or
organizational) information, then disclosure of this data or any knowledge extracted
from the data may potentially reveal sensitive trade secrets, whose knowledge can
provide a significant advantage to business competitors and thus can cause the data
owner to lose business over his or her peers.
The aforementioned privacy concerns in the course of data mining are signifi-
cantly amplified due to the fact that simple de-identification 1 of the original data
prior to its mining has been proven to be in several cases insufficient to guarantee
a privacy-aware outcome. Indeed, intelligent analysis of the data through inference-
based attacks, may uncover sensitive patterns to untrusted entities. This can be
mainly achieved by utilizing external and publicly available sources of information
(e.g., the yellow pages, patient demographics and discharge summaries, or other
public reports) in conjunction with the released data or knowledge to re-identify
individuals or uncover hidden knowledge patterns [21, 49]. Thus, compliance to pri-
1 Data de-identification refers to the process of removing obvious identifiers from the data (e.g.,
names, social security numbers, addresses, etc) prior to its disclosure. A typical de-identification
strategy for patient-specific data is based on the Safe Harbor standard of the Health Insurance
Portability and Accountability Act (HIPAA) [1], whereby records are stripped of a number of
potential identifiers, such as personal names and geocodes.
Search WWH ::




Custom Search