Test-Driven Reuse: Key to Improving Precision of Search Engines for Software Reuse - Finding Source Code on the Web for Remix and Reuse

Databases Reference

In-Depth Information

and the retrieval problem dealing with how to efficiently retrieve the artifacts that

are useful in a given context are still in the focus of interest in the research commu-

nity. Garcia et al. [ 10 ] have recently underlined this with their list of requirements

for a component search engine: amongst other challenges they see a simple query

formulation and a good retrieval quality at the heart of a successful and scalable

component search engine. Unfortunately, as has been shown recently [ 17 ], simple

keyword- or signature-driven searches may lead to tens of thousands of results from

which - in principle - each one matches the given query criterion. However, only

because a - in these two cases - relatively simple technical matching criterion is

fulfilled, a search result does not necessarily become relevant for the user (see e.g.

[ 26 ]). Consider, for example, that a search for a reusable “spreadsheet” component

merely delivers a test case for a spreadsheet because it naturally has to use a spread-

sheet and thus contains the term. A user presented with such a result would certainly

be disappointed and after inspecting perhaps five or ten similar results not consider

using the search engine again, as in a reuse context, it is important to get results

precisely matching a given specification [ 1 ].

Interestingly, most existing software search engines still rely on a simple key-

word matching so that they suffer from exactly this problem. Although it seems

possible to narrow down the search results considerably through adding more key-

words to a certain degree, beyond that there still existed no intuitive approach for

formulating interface-based or even specification-based queries in second genera-

tion software search engines as described in another chapter of this topic [ 5 ]. Only

Google's code search engine allowed the use of (rather complex) regular expressions

in order to describe the desired interface of a component.

12.3 Test-Driven Reuse

According to the classification of Mili et al. presented in Sect. 12.2 , the test-driven

reuse approach Hummel and Atkinson have first introduced in 2004 [ 15 ], is a tech-

nique based on operational semantics and hence inspired by the ideas of Behavior

Sampling by Podgurski and Pierce [ 30 ]. Due to their random nature, Behavior Sam-

pling requires a relatively large number of samples even for simple functions and,

to our knowledge, was never used in practice.

What is extensively used in practice, on the contrary, is (or at least should be)

systematic software testing with targeted “samples” of a software's functionality

derived with the help of some systematic approach such as equivalence class parti-

tioning. In case of so-called test-driven development [ 4 ], which is especially popular

in agile development communities, test cases are even created before any production

code is written and are used to monitor the production code's degree of complete-

ness and correctness during development iterations. From this starting point it is

just a small step to imagine the usefulness of test cases in determining the fitness

for purpose of reuse candidate. Assume as an example that we need a component

offering the functionality of a typical spreadsheet application (such as Excel), i.e.,

Search WWH ::

Custom Search

Home