A Behavior-Based Approach to Securing Email Systems - Computer Network Security

Information Technology Reference

In-Depth Information

For any connection between the selected user and a given recipient belonging to a

clique, the algorithm implemented in EMT allocates 60% of the email flow from user

to recipient to the given cliques, and the rest to the recipient. This number was chosen

to reflect the fact that if a user and a recipient belong to the same clique, most of the

email flow between the two is assumed to belong to the clique.

In some cases, two or more cliques may share the same connection between a user

and a recipient. For example if A, B and C belong to clique 1 , and A, B and D belong

to clique 2 , the connection between A and B is shared among the two cliques. In that

case, half of the 60% allocated to cliques will be split between clique 1 and clique 2 in

order to calculate the frequency, from A to B, say. If 100 messages were sent from A

to B, 40 are assigned to B, 30 to clique 1 and 30 to clique 2 .

Cliques tend to have high ranks in the frequency table, as the number of emails

corresponding to cliques is the aggregate total for a few recipients. Let's for example

assume that clique 1 = {A, B, C, D}, and that clique 1 shares no connection with other

cliques. If A sent 200 messages to B, 100 to C and 100 to D, the number of messages

allocated, respectively, to B is 80, to C is 40, to D is 40, and to is 240. Thus, the

clique will get a large share of the flow, and this is expected, as they model small

groups of tightly connected users with heavy email traffic.

2.5.2 Enclave Cliques vs. User Cliques

Conceptually, two types of cliques can be formulated. The one described in the pre-

vious section can be called enclave cliques because these cliques are inferred by look-

ing at email exchange patterns of an enclave of accounts. In this regard, no account is

treated special and we are interested in email flow pattern on the enclave-level. Any

flow violation or a new flow pattern pertains to the entire enclave. On the other hand,

it is possible to look at email traffic patterns from a different viewpoint altogether.

Consider we are focusing on a specific account and we have access to its outbound

traffic log. As an email can have multiple recipients, these recipients can be viewed

as a clique associated with this account. Since another clique could subsume a clique,

we defined a user clique as one that is not a subset of any other cliques. In other

words, user cliques of an account are its recipient lists that are not subsets of other

recipient lists.

To illustrate the idea of both types of cliques and show how they might be used in

a spam detection task, two simulations are run. In both cases, various attack strategies

are simulated. Detection is attempted based on examining a single attack email.

Final results are based on how well such detection performs statistically.

In the case of enclave cliques, the following simulation is performed. An enclave

of 10 accounts is created, with each account sending 500 emails. Each email has a

recipient list that is no larger than 5 and whose actual size follows Zipf distribution,

where the rank of the size of recipient lists is in decreasing order of the size; i.e. sin-

gle-recipient emails have a rank of 1 and 5-recipient emails have a rank of 5. Fur-

thermore, for each account, a random rank is assigned to its potential recipients and

this rank is constant across all emails sent. Once the recipient list size an email is

determined, the actual recipients of that email is generated based on generalized Zipf

distribution, with theta = 2. Finally, a threshold of 50 is used to qualify any pair of

accounts to be in a same clique.

In terms of attack strategies used, 5 different ones are tested. The first is to send to

all potential recipient addresses, one at a time. The second, third and fourth attack

Computer Network Security

Search WWH ::

Custom Search

Home