Databases Reference
In-Depth Information
Our goal was to cover a wide range of people that search for source code of-
ten, to get a representative sample. The population was any programmer who had
searched for source code on the Internet. However, it was not possible to obtain a
systematically random sample, and availability sampling also known as convenience
sampling was the chosen sampling technique.
Convenience sampling may pose a threat to external validity of the results. How-
ever, this was an exploratory study and the goal was to collect data on a variety of
behavior, and not its prevalence, so availability sampling was considered adequate
for this task. We solicited participants from a number of mailing lists and news-
groups. We attempted to solicit participants through open source news web sites,
but were declined. This strategy gave us access to a large number of developers and
users of open source software, as well as developers who worked on proprietary and
commercial software.
The survey was open for 6 months in 2006-2007 to collect responses. Invita-
tions to participate in the survey were posted to the Javaworld mailing list, and the
following mailing lists beginners-cgi@perl.org, comp.software-engg, comp.lang.c,
and comp.lang.java. We chose these web sites, because had they had users with a
variety of interests, the discussions were high technical in nature, and there was little
overlap between the groups.
3.3.3 Data Analysis
The data was analyzed using a combination of quantitative and qualitative tech-
niques. The multiple-choice questions were coded using nominal and ordinal scale
variables. For the open ended questions, the responses were text descriptions that
were analyzed qualitatively. We analyzed them for recurring patterns using open
coding [ 7 ] and a grounded theory approach [ 17 ]. Without making prior assumptions
about what we would find, we developed codes for categories iteratively and induc-
tively. The two authors analyzed the data separately, and we found a high level of
agreement in our categories. Subsequently, we combined our codes and refined the
categories for clarity of presentation.
3.3.4 Threats to Validity
The main shortcoming of this study is generalizability, i.e. the sample of respon-
dents is not sufficiently representative of the population. This is a basic problem
with empirical research in software engineering is there is not a reliable model of
population characteristics so that the representativeness of a sample can be assessed.
Search WWH ::




Custom Search