Privacy-Preserving Naive Bayesian Classification over Horizontally Partitioned Data - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

Tabl e 1 .

V NB 1

V NB 2

V NB 3

···

V NB k

V NB 1

− 1

···

− 1

V NB 2

− 1

···

− 1

V NB 3

···

V NB k

− 1

···

Tabl e 2 .

S 1

S 2

S 3

S 4

Weight

S 1

− 1

− 2

S 2

− 1+1

S 3

S 4

− 1

− 1+1

one is the ϕ , the sequence of e ( V NB i − V NB j ), for i,j ∈ [1 ,k ]( i>j ), and

the other is ϕ , the sequence of +1 /

1. The two sequences have the same

number of elements. P 1 knows whether or not V NB i is larger than V NB j by

checking the corresponding value in the ϕ sequence. For example, if the first

element ϕ is

−

1, P 1 concludes V NB i <V NB j . P 1 examines the two sequences

and constructs the index table (Table 1) to compute the largest element.

In Table 1, +1 in entry ij indicates that the information gain of the row

(e.g., V NB i

−

of the i th row) is not less than the information gain of a column

(e.g., V NB j

1, otherwise. P 1 sums the index values of

each row and uses this number as the weight of the information gain in that

row. She then selects the one that corresponds to the largest weight.

To make it clearer, let's illustrate it by an example. Assume that: (1)

there are four information gains with V NB 1 <V NB 4 <V NB 2 <V NB 3 ; (2) the

sequence ϕ is [ e ( V NB 1 −

of the j th column);

−

V NB 2 ) ,e ( V NB 1 −

V NB 3 ) ,e ( V NB 1 −

V NB 4 ) ,e ( V NB 2 −

sequence ϕ

V NB 3 ) ,e ( V NB 2

−

V NB 4 ) ,e ( V NB 3

−

V NB 4 )].

The

will

1 , +1 , +1]. According to ϕ and ϕ , P 1 builds the Table 2.

From the table, P 1 knows V NB 3

[

−

1 ,

−

1 ,

−

1 ,

−

is the largest element since its weight, which

is +4, is the largest.

Theorem 11. (Privacy-Preserving). Assuming the parties follow the protocol,

the private data are securely protected.

Proof. In Protocol 4, we need prove it from two aspects: (1) P 1 doesn't get

information gain (e.g., V NB i ) for each attribute. What P 1 gets are e ( V NB i −

V NB j ) for all i,j

V NB j ),

P 1 cannot know each information gain since it is encrypted. By +1 /

∈

[1 ,k ] ,i>j and +1 /

−

1 sequence. By e ( V NB i −

sequence, P 1 can only know whether or not V NB i is greater than P j .(2) P n

doesn't obtain information gain for each attribute either. Since the sequence

of e ( V NB i −

−

V NB j ) is randomized before being send to P n who can only know

Data Mining: Foundations and Practice

Search WWH ::

Custom Search

Home