A Controlled Experiment on the Process Used by Developers During Internet-Scale Code Search - Finding Source Code on the Web for Remix and Reuse

Databases Reference

In-Depth Information

software development can have significant impact on whether a project is completed

on time, with the allocated budget and with the desired functionality. Some Internet-

scale code search engines, such as Koders, Google Code Search, SourceForge and

Krugle, have emerged to fill this niche. But little work has been conducted to under-

stand developers' search process when using these tools.

We conducted an experiment to better understand how people search for code

on the Web. Twenty-four subjects participated in the study. They were given a sce-

nario and asked to search for source code to satisfy the task described. When the

subjects settled on a query that produced satisfactory results, they were asked to

judge the relevance of the first ten results (P@10). The scenarios varied along two

dimensions: intention of search (as-is reuse or reference example) and size of the

search target (block or subsystem) [ 17 ]. We used these two dimensions as between-

subjects independent variables in our experiment. There are also different kinds of

search engines that can be used to locate code on the Web and we used this factor

as a within-subjects independent variable with five levels (Google, Koders, Krugle,

Google Code Search, and SourceForge). The dependent variables were the length of

the query, the number of queries in a session, the clickthrough rate on results, pre-

cision of the first ten results (P@10), and the duration of the session. Each of these

variables provided insight into different stages of the search process. The findings

reported in this chapter are complementary to a paper previous published using the

same data [ 14 ] that focused on the P@10 dependent variable and insights into the

search engines.

On average, the developers used each search engine for 6 min. During that time,

they entered an average of 2.4 queries with 4 terms each. They navigated to 62 %

of the search results overall, but a higher proportion (81 %) of relevant results, com-

pared to irrelevant ones (47 %).

We obtained a variety of statistically significant results, including some interac-

tion effects. Participants used more terms in their queries when carrying out search

on Google or when searching for reference examples. They made more query re-

visions when searching for blocks of code and for reference examples. When they

specialize their query by adding more terms, their next likely step is to generalize

their query by removing terms. They had a lower clickthrough rate on SourceForge

or when searching for code to reuse. Furthermore, they spent more time searching

for reference examples or blocks of code or when using Google. We found that

searches for reference examples gave a higher P@10, or a higher proportion of rel-

evant results. Google gave the most relevant results. Koders and Krugle gave more

relevant results on searches for subsystems and Google gave more relevant results

for blocks.

One consistent trend across all the dependent variables is more effort was ex-

pended on searches for reference examples than for components to reuse as-is. Ref-

erence example searches involved more terms per query on average, more queries

per session, a higher clickthrough rate, and more time. The additional effort was

rewarded by more relevant results among the first ten matches.

In comparison to Web search, developers performing code search issued more

queries, longer queries and made greater use of advanced features than users of

Web search. We synthesized these statistical results into a model of user behavior

Finding Source Code on the Web for Remix and Reuse

Search WWH ::

Custom Search

Home