Merlin: Specification Inference for Explicit Information Flow Problems - Mining Software Specifications: Methodologies and Applications

Databases Reference

In-Depth Information

11.1 Introduction

Constraining information flow is fundamental to security: We do not want

secret information to reach untrusted principals (confidentiality), and we do

not want untrusted principals to corrupt trusted information (integrity). If we

take confidentiality and integrity to the extreme, then principals from different

levels of trust can never interact, and the resulting system becomes unusable.

For instance, such a draconian system would never allow a trusted user to

view untrusted content from the Internet.

Thus, practical systems compromise on such extremes and allow flow of

sanitized information across trust boundaries. For instance, it is unacceptable

to take a string from untrusted user input and use it as part of an SQL query,

since it leads to SQL injection attacks. However, it is acceptable to first pass

the untrusted user input through a trusted sanitization function, and then

use the sanitized input to construct an SQL query. Similarly, confidential

data need to be cleansed to avoid information leaks. Practical checking tools

that have emerged in recent years [4, 21, 24] typically aim to ensure that all

explicit flows of information across trust boundaries are sanitized.

The fundamental program abstraction used in the sequel (as well as by

existing tools) is what we term the propagation graph | a directed graph that

models all interprocedural explicit information flow in a program. 1 The nodes

of a propagation graph are methods, and edges represent explicit information

flow between methods. There is an edge from node m 1 ! m 2 whenever there

is a flow of information from method m 1 to method m 2 , through a method

parameter, or through a return value, or by way of an indirect update through

a pointer.

Following the widely accepted Perl taint terminology conventions [30] |

more precisely dened in [14] | nodes of the propagation graph are classied

as sources, sinks, and sanitizers; nodes not falling in the above categories are

termed regular nodes. A source node returns tainted data, whereas it is an

error to pass tainted data to a sink node. Sanitizer nodes cleanse or untaint or

endorse information to mediate across different levels of trust. Regular nodes

do not taint data, and it is not an error to pass tainted data to regular nodes.

If tainted data are passed to regular nodes, they merely propagate it to their

successors without any mediation.

A classification of nodes in a propagation graph into sources, sinks, and

sanitizers is called an information flow specification, or just specification for

brevity. Given a propagation graph and a specification, one can easily run a

reachability algorithm to check if all paths from sources to sinks pass through

1 We do not focus on implicit information flows [27] in this paper. Discussions with

Cat.Net[21] developers reveal that detecting explicit information flow vulnerabilities is a

more urgent concern. Existing commercial tools in this space exclusively focus on explicit

information flow.

Search WWH ::

Custom Search

Home