Merlin: Specification Inference for Explicit Information Flow Problems - Mining Software Specifications: Methodologies and Applications

Databases Reference

In-Depth Information

The last several years have seen a proliferation of static and runtime analy-

sis tools for finding security violations that are caused by explicit information

flow in programs. Much of this interest has been caused by the increase in the

number of vulnerabilities such as cross-site scripting and SQL injection. In

fact, these explicit information flow vulnerabilities commonly found in Web

applications now outnumber vulnerabilities such as buffer overruns common

in type-unsafe languages such as C and C++. Tools checking for these vulner-

abilities require a specification to operate. In most cases the task of providing

such a specification is delegated to the user. Moreover, the ecacy of these

tools is only as good as the specification. Unfortunately, writing a compre-

hensive specification presents a major challenge: Parts of the specification are

easy to miss, leading to missed vulnerabilities; similarly, incorrect specifica-

tions may lead to false positives.

This paper proposesMerlin, a new approach for automatically inferring

explicit information flow specifications from program code. Such specifications

greatly reduce manual labor, and enhance the quality of results, while using

tools that check for security violations caused by explicit information flow.

Beginning with a data propagation graph, which represents interprocedural

flow of information in the program,Merlinaims to automatically infer an

information flow specication.Merlinmodels information flow paths in the

propagation graph using probabilistic constraints. A na ve modeling requires

an exponential number of constraints, one per path in the propagation graph.

For scalability, we approximate these path constraints using constraints on

chosen triples of nodes, resulting in a cubic number of constraints. We char-

acterize this approximation as a probabilistic abstraction, using the theory

of probabilistic refinement developed by McIver and Morgan. We solve the

resulting system of probabilistic constraints using factor graphs, which are a

well-known structure for performing probabilistic inference.

We experimentally validate theMerlinapproach by applying it to 10 large

business-critical Web applications that have been analyzed withCat.Net,

a state-of-the-art static analysis tool for .NET. We find a total of 167 new

confirmed specifications, which result in a total of 322 additional vulnerabil-

ities across the 10 benchmarks. More accurate specifications also reduce the

false positive rate: In our experiments,Merlin-inferred specifications result

in 13 false positives being removed; this constitutes a 15% reduction in the

Cat.Netfalse positive rate on these 10 programs. The final false positive rate

forCat.Netafter applyingMerlinin our experiments drops to under 1%.

Search WWH ::

Custom Search

Home