Databases Reference
In-Depth Information
software development can have significant impact on whether a project is completed
on time, with the allocated budget and with the desired functionality. Some Internet-
scale code search engines, such as Koders, Google Code Search, SourceForge and
Krugle, have emerged to fill this niche. But little work has been conducted to under-
stand developers' search process when using these tools.
We conducted an experiment to better understand how people search for code
on the Web. Twenty-four subjects participated in the study. They were given a sce-
nario and asked to search for source code to satisfy the task described. When the
subjects settled on a query that produced satisfactory results, they were asked to
judge the relevance of the first ten results (P@10). The scenarios varied along two
dimensions: intention of search (as-is reuse or reference example) and size of the
search target (block or subsystem) [ 17 ]. We used these two dimensions as between-
subjects independent variables in our experiment. There are also different kinds of
search engines that can be used to locate code on the Web and we used this factor
as a within-subjects independent variable with five levels (Google, Koders, Krugle,
Google Code Search, and SourceForge). The dependent variables were the length of
the query, the number of queries in a session, the clickthrough rate on results, pre-
cision of the first ten results (P@10), and the duration of the session. Each of these
variables provided insight into different stages of the search process. The findings
reported in this chapter are complementary to a paper previous published using the
same data [ 14 ] that focused on the P@10 dependent variable and insights into the
search engines.
On average, the developers used each search engine for 6 min. During that time,
they entered an average of 2.4 queries with 4 terms each. They navigated to 62 %
of the search results overall, but a higher proportion (81 %) of relevant results, com-
pared to irrelevant ones (47 %).
We obtained a variety of statistically significant results, including some interac-
tion effects. Participants used more terms in their queries when carrying out search
on Google or when searching for reference examples. They made more query re-
visions when searching for blocks of code and for reference examples. When they
specialize their query by adding more terms, their next likely step is to generalize
their query by removing terms. They had a lower clickthrough rate on SourceForge
or when searching for code to reuse. Furthermore, they spent more time searching
for reference examples or blocks of code or when using Google. We found that
searches for reference examples gave a higher P@10, or a higher proportion of rel-
evant results. Google gave the most relevant results. Koders and Krugle gave more
relevant results on searches for subsystems and Google gave more relevant results
for blocks.
One consistent trend across all the dependent variables is more effort was ex-
pended on searches for reference examples than for components to reuse as-is. Ref-
erence example searches involved more terms per query on average, more queries
per session, a higher clickthrough rate, and more time. The additional effort was
rewarded by more relevant results among the first ten matches.
In comparison to Web search, developers performing code search issued more
queries, longer queries and made greater use of advanced features than users of
Web search. We synthesized these statistical results into a model of user behavior
Search WWH ::




Custom Search