Merlin: Specification Inference for Explicit Information Flow Problems - Mining Software Specifications: Methodologies and Applications - page 383

Databases Reference

In-Depth Information

interprocedural data flow. Nodes of the propagation graph are methods in the

program, and edges represent explicit flow of data.

Denition 11.2.1 A propagation graph is a directed graph G = hN G ;E G i,

where nodes N G are methods and an edge (n 1 ! n 2 ) 2 E g indicates that

there is a flow of information from method n 1 to method n 2 through method

arguments, or return values, or indirectly through pointers.

The propagation graph is a representation of the interprocedural data flow

produced by static analysis of the program. Due to the presence of virtual

functions and information flow through references, a pointer analysis is needed

to produce a propagation graph. Since pointer analyses are imprecise, edges of

a propagation graph approximate information flows. Improving the accuracy

of propagation graph involves improving the precision of pointer analysis and

is beyond the scope of this chapter. In our implementation,Cat.Netuses an

unsound pointer analysis, so there could be flows of data that our propagation

graph does not represent. Even though propagation graphs can have cycles,

Merlinperforms a breadth-first search and deletes the edges that close cycles,

resulting in an acyclic propagation graph. Removing cycles greatly simplifies

the subsequent phases ofMerlin. As our empirical results show, even with

such approximations in the propagation graph,Merlinis able to infer useful

specifications.

Example 2 We illustrate propagation graphs with an example.

publicvoidTestMethod1(){

stringa=ReadData1();

stringb=Prop1(a);

stringc=Cleanse(b);

WriteData(c);

ReadData1

ReadData2

}

Prop1

Prop2

publicvoidTestMethod2(){

stringd=ReadData2();

stringe=Prop2(d);

stringf=Cleanse(e);

WriteData(f);

Cleanse

WriteData

}

In addition to two top-level \driver" methods, TestMethod1 and

TestMethod2 , this code uses six methods: ReadData1 , ReadData2 , Prop1 ,

Prop2 , Cleanse , and WriteData . This program gives rise to the propaga-

tion graph shown on the right. An edge in the propagation graph represents

explicit flow of data, i.e., the value returned by Prop1 is passed into Cleanse

as an argument. The edge from Prop1 to Cleanse represents this flow.

2

11.2.1 Assumptions and Probabilistic Inference

The crux of our approach is probabilistic inference: We first use the prop-

agation graph to generate a set of probabilistic constraints and then use prob-

abilistic inference to solve them.Merlinuses factor graphs (see Section 11.3)

Next Page

Mining Software Specifications: Methodologies and Applications

Search WWH ::

Custom Search

Home