Archetypal Internet-Scale Source Code Searching - Finding Source Code on the Web for Remix and Reuse

Databases Reference

In-Depth Information

Our goal was to cover a wide range of people that search for source code of-

ten, to get a representative sample. The population was any programmer who had

searched for source code on the Internet. However, it was not possible to obtain a

systematically random sample, and availability sampling also known as convenience

sampling was the chosen sampling technique.

Convenience sampling may pose a threat to external validity of the results. How-

ever, this was an exploratory study and the goal was to collect data on a variety of

behavior, and not its prevalence, so availability sampling was considered adequate

for this task. We solicited participants from a number of mailing lists and news-

groups. We attempted to solicit participants through open source news web sites,

but were declined. This strategy gave us access to a large number of developers and

users of open source software, as well as developers who worked on proprietary and

commercial software.

The survey was open for 6 months in 2006-2007 to collect responses. Invita-

tions to participate in the survey were posted to the Javaworld mailing list, and the

following mailing lists beginners-cgi@perl.org, comp.software-engg, comp.lang.c,

and comp.lang.java. We chose these web sites, because had they had users with a

variety of interests, the discussions were high technical in nature, and there was little

overlap between the groups.

3.3.3 Data Analysis

The data was analyzed using a combination of quantitative and qualitative tech-

niques. The multiple-choice questions were coded using nominal and ordinal scale

variables. For the open ended questions, the responses were text descriptions that

were analyzed qualitatively. We analyzed them for recurring patterns using open

coding [ 7 ] and a grounded theory approach [ 17 ]. Without making prior assumptions

about what we would find, we developed codes for categories iteratively and induc-

tively. The two authors analyzed the data separately, and we found a high level of

agreement in our categories. Subsequently, we combined our codes and refined the

categories for clarity of presentation.

3.3.4 Threats to Validity

The main shortcoming of this study is generalizability, i.e. the sample of respon-

dents is not sufficiently representative of the population. This is a basic problem

with empirical research in software engineering is there is not a reliable model of

population characteristics so that the representativeness of a sample can be assessed.

Search WWH ::

Custom Search

Home