A Behavior-Based Approach to Securing Email Systems - Computer Network Security

Information Technology Reference

In-Depth Information

We seek to identify clusters or groups of related email accounts that frequently

communicate with each other, and then use this information to identify unusual email

behavior that violates typical group behavior. For example, intuitively it is doubtful

that a user will send the same email message to his spouse, his boss, his “drinking

buddies” and his church elders all appearing together as recipients of the same mes-

sage. A virus attacking his address book would surely not know the social relation-

ships and the typical communication pattern of the victim, and hence would violate

the user's group behavior profile if it propagated itself in violation of the user's “so-

cial cliques”.

Clique violations may also indicate internal email security policy violations. For

example, members of the legal department of a company might be expected to ex-

change many Word attachments containing patent applications. It would be highly

unusual if members of the marketing department, and HR services would likewise

receive these attachments. EMT can infer the composition of related groups by ana-

lyzing normal email flows and computing cliques (see Fig. 5), and use the learned

cliques to alert when emails violate clique behavior (see Fig. 7).

EMT provides the clique finding algorithm using the branch and bound algorithm

described in [0]. We treat an email account as a node, and establish an edge between

two nodes if the number of emails exchanged between them is greater than a user

defined threshold, which is taken as a parameter (Fig. 7 is displayed with a setting of

100). The cliques found are the fully connected sub-graphs. For every clique, EMT

computes the most frequently occurring words appearing in the subject of the emails

in question which often reveals the clique's typical subject matter under discussion.

(The reader is cautioned not to confuse the computation of cliques, with the maxi-

mal Clique finding problem, that is NP-complete. Here we are computing the set of

all cliques in an email archive which has near linear time complexity.)

2.5.1 Chi Square + Cliques

The Chi Square + cliques (CS + cliques) feature in EMT is the same as the Chi

Square window described in section 3.4.2, with the addition of the calculation of

clique frequencies.

In summary, the clique algorithm is based on graph theory. It finds the largest

cliques (group of users), which are fully connected with a minimum number of emails

per connection at least equal to the threshold (set at 50 by default). For example if

clique 1 is a clique of three users A, B and C, meaning that A and B have exchanged at

least 50 emails; similarly B and C, and A and C, have exchanged at least 50 emails.

The Clique Threshold field can be changed from this window, which will recalculate

the list of all cliques for the entire database, and concurrently the metrics in the win-

dow are automatically readjusted accordingly.

In this window, each clique is treated as if it were a single recipient, so that each

clique has a frequency associated with it. Only the cliques to which the selected user

belongs will be displayed. Some users don't belong to any clique, and for those, this

window is identical to the normal Chi Square window.

If the selected user belongs to one or more cliques, each clique appears under the

name clique i , i:=1,2,..., and is displayed in a cell with a green color in order to be

distinguishable from individual email account recipients. (One can double click on

each clique's green cell, and a window pops-up with the list of the members of the

clique.)

Computer Network Security

Search WWH ::

Custom Search

Home