Databases Reference
In-Depth Information
15.5.2 Example Overflow Implementation
15.5.2.1 Populating the Repository
Example Overflow uses Stack Overflow's API to request all the questions relevant
to our current domain, jQuery, and it filters out all the questions without an accepted
answer. It follows a conservative approach by choosing only accepted answers to
ensure retrieval of high quality results. The next step is to check whether each of
these questions has a code snippet inside the accepted answer. If so, that code snippet
is extracted and saved to a database with all the accompanying information: the
question title, the question body, the answer body, the code snippet itself, the user
rating of the answer from Stack Overflow, the view count of the question, the tags
associated with the question and other relevant information. This process can be
executed as a scheduled task to allow keeping the data in sync with the data at Stack
Overflow.
15.5.2.2 Searching
Example Overflow uses keyword search based on the Apache Lucene [ 16 ] li-
brary, which internally uses the term frequency-inverse document frequency (tf-idf)
weight [ 40 ]. In order for Apache Lucene to search, one needs to define which param-
eters are to be analyzed and indexed. For keyword search index, Example Overflow
uses both the code snippet and the additional metadata which accompanied the code
snippet at Stack Overflow. This allows a developer to find code snippets that may
not contain the search query keyword, but the keyword appears in the contextual
data and indicates that it has been used in that context.
Each code example is represented as a document with several parts: title, tag,
answer, question, code, and social metadata. Example Overflow uses the following
formula to calculate the score of each document representing a code example:
S doc =[
W title S title +
W tag S tag
+
W answer S answer
+
W question S question
+
W code S code ]
S metadata
15.5.3 Discussion
Searching for code examples is possible using Stack Overflow directly. However
using designated code search tools on top of Stack Overflow may provide better
results in terms of streamlining the various activities involved in example cen-
tric development (search, evaluation, and embedding). Designated tools may also
introduce search mechanisms optimized for code search, they can minimize the con-
text switch involved in leaving the IDE (as implemented in Blueprint [ 11 ], Strath-
cona [ 18 ], and recently Seahawk [ 4 ]), and may even use static analysis techniques
Search WWH ::




Custom Search