DynaMine: Finding Usage Patterns and Their Violations by Mining Software Repositories - Mining Software Specifications: Methodologies and Applications

Databases Reference

In-Depth Information

collected from the literature, newsgroups, and previous bug reports, applica-

tion programmers are rarely able to tell which invariants the APIs they use

have. The situation is only slightly better when it comes to software architects

and API designers who are generally much more aware of application-specific

patterns.

In this chapter we propose an automatic way to extract likely error pat-

terns by mining software revision histories. Moreover, in order to ensure that

all the errors we find are relatively easy to confirm and fix, we pay particu-

lar attention in our experiments to errors that can be fixed with a one-line

change. It is worth pointing out that many well-known error patterns such as

memory leaks, double- free 's, mismatched locks, open and close operations on

operating system resources, buffer overruns, and format string errors can often

be addressed with a one-line fix. Looking at incremental changes between revi-

sions as opposed to complete snapshots of the source allows us to better focus

our mining strategy and obtain more precise results. Our approach uses revi-

sion history information to infer likely error patterns. We then experimentally

evaluate the patterns we extracted by checking for them dynamically.

We have performed experiments on Eclipse and jEdit, two large, widely

used open-source Java applications. Both Eclipse and jEdit have many man-

years of software development behind them and, as a collaborative effort of

hundreds of people across different locations, are good targets for revision his-

tory mining. By mining CVS, we have identified 56 high-probability patterns

in Eclipse and jEdit APIs, all of which were previously unknown to us. Out

of these, 21 were dynamically confirmed as valid patterns and 263 pattern

violations were found.

7.1.1 Contributions

This chapter makes the following contributions:

We present DynaMine, 1 a tool for discovering usage patterns and detect-

ing their violations in large software systems [28, 29]. All of the steps

involved in mining and running the instrumented application are acces-

sible to the user from within an Eclipse plugin: DynaMine automates the

task of collecting and pre-processing revision history entries and mining

for common patterns. Likely patterns are then presented to the user for

review; runtime instrumentation is generated for the patterns that the

user deems relevant. Results of dynamic analysis are also presented to

the user in an Eclipse view.

We propose a data mining strategy that detects common usage patterns

in large software systems by analyzing software revision histories. Our

strategy is based on a classic Apriori data mining algorithm, which we

1 The name DynaMine comes from the combination of D ynamic analysis and M ining

revision histories.

Search WWH ::

Custom Search

Home