Databases Reference
In-Depth Information
To parse out identifiers and comments from the code snippets, we use the Eclipse
abstract syntax tree parser. 7 We chose to use this API in Juicy, because it is a robust
incremental compiler. Within the Eclipse workbench, the AST parser is capable of
parsing code as it is typed and compiling classes as they are saved. For Juicy, it
provides two features that are particularly helpful: input type selection and error
handling. Input type selection allows the AST parser to be called with a flag that
specifies whether the input code is complete source code, a block, or a line of code.
This allows the parser to produce more accurate output. If the flag is set incorrectly
for an input snippet, the parser will produce errors and we can try again with a
different flag.
We collected ten types of structural information from code snippets which are
showninTable 14.3 . The first nine types are mainly identifier declarations and invo-
cations, which can be generalized into terms of package, class, variable, and method
information.
Extracted identifier types
Package Import Class declaration
Class used Extending and implementing class Return type
Variable declaration Method declaration
Method invocation
Comment
Table 14.3: Extracted indentifier types from parser
Another metadata field was added to store all the English words in the iden-
tifiers, which were found by dividing the identifiers according to internal capital-
ization. The scheme is also known as “camel case,” because uppercase letters in
an identifier are taller than the lowercase ones, giving the identifier the appearance
of camel-like humps. As specified in the Sun Java Coding Convention, camel case
is the recommended standard for aggregating English words to form a meaningful
identifier. For this metadata field, we excluded Java keywords such as 'class,' 'for,'
'new,' and 'void.'
The last piece of information to be parsed from code snippets is code comments.
Search engines can benefit from comments because they provide information about
the context of the code snippets and also contribute potential matches to search
terms. Having more keywords associated with a piece of source code will allow
users to have more changes that a keyword included in a query matches with a
keyword related to a piece of source code. Therefore, javadoc comments, line com-
ments, and block comments were collected and treated as textual information in
Juicy.
7 http://www.eclipse.org/articles/article.php?file=Article-JavaCodeManipulation_AST/index.html .
 
Search WWH ::




Custom Search