Information Technology Reference
In-Depth Information
4 Synopsis Techniques
Evidence of crimes can be found in network events that are well defined by
various fields in packet headers and/or by application dependent payloads. A
brute force approach that entails storing entire packets is not feasible for rea-
sons discussed in Section 1. Therefore, we adopt synopsis techniques to record
the network events in a succinct form for a prolonged period of time. A synopsis
is a succinct representation of massive base data that can provide approximate
answers to queries about base data within a predefined level of confidence. Defin-
ing a network event can be as simple as concatenating a set of values from fields
in the header of a single packet or it can be as complex as extracting various
features from a collection of several packets. For example, a TCP connection
establishment event can be defined by a pair of IP addresses and a pair of port
numbers from a TCP packet with SYN flag set. On the other hand, port scanning
event cannot be defined simply by a set of fields in the packet headers or by the
payload. Defining a port scan would involve computing a variety of properties of
network trac, such as packet arrival rate per host and distribution of source IP
addresses etc. In this section we briefly present a few useful synopsis techniques
that can be used to capture network events succinctly. A detailed discussion of
these and other techniques can be found in [35].
Connection Records: Network events, such as connection establishment and tear-
down, are usually well defined by various fields in the packet headers. A simple
scheme that is sucient to answer queries of the form who established a connec-
tion during time
? and how long did the connection last? is to store the pair
of endpoints (host, port) with the TCP flags. We call this a connection record.
We can store connection records for all TCP connections, within a given time
window, if any of the three flags SYN, FIN, or RST is set. The cost for this is
roughly 26 bytes per TCP connection. So for example, in a moderately loaded
network, where on an average we expect to see 100 connections a second, this
would require about 1300 bytes to be timestamped and stored every second. The
total cost over a day of these records would only be about 200MB.
t
Bloom Filters: A different and potentially more space-ecient approach to track
network events can be arrived at by using Bloom filters [8]. A Bloom filter is a
probabilistic algorithm to quickly test membership in a large set using multiple
hash functions into a single array of bits. Space eciency is achieved at the cost
of a small probability of false positives. For example, we adapt Bloom filters to
keep track of connections by periodically inserting the hash of the pair of IP
addresses, the port numbers and TCP flags of packets with TCP SYN, FIN, or
RST flags set. To check whether a particular connection was seen during a time
window, we compute the necessary hashes (on IP, port, and TCP flag fields) and
test it against all available Bloom filters for the time window. If a Bloom filter
answers “yes” then we know the connection packet was seen with a confidence
level determined by the false positive rate of the filter. Otherwise we know for
certain that the corresponding packet was not seen.
Search WWH ::




Custom Search