Novel and Applied Algorithms in a Search Engine for Java Code Snippets - Finding Source Code on the Web for Remix and Reuse - page 278

Databases Reference

In-Depth Information

To parse out identifiers and comments from the code snippets, we use the Eclipse

abstract syntax tree parser. 7 We chose to use this API in Juicy, because it is a robust

incremental compiler. Within the Eclipse workbench, the AST parser is capable of

parsing code as it is typed and compiling classes as they are saved. For Juicy, it

provides two features that are particularly helpful: input type selection and error

handling. Input type selection allows the AST parser to be called with a flag that

specifies whether the input code is complete source code, a block, or a line of code.

This allows the parser to produce more accurate output. If the flag is set incorrectly

for an input snippet, the parser will produce errors and we can try again with a

different flag.

We collected ten types of structural information from code snippets which are

showninTable 14.3 . The first nine types are mainly identifier declarations and invo-

cations, which can be generalized into terms of package, class, variable, and method

information.

Extracted identifier types

Package Import Class declaration

Class used Extending and implementing class Return type

Variable declaration Method declaration

Method invocation

Comment

Table 14.3: Extracted indentifier types from parser

Another metadata field was added to store all the English words in the iden-

tifiers, which were found by dividing the identifiers according to internal capital-

ization. The scheme is also known as “camel case,” because uppercase letters in

an identifier are taller than the lowercase ones, giving the identifier the appearance

of camel-like humps. As specified in the Sun Java Coding Convention, camel case

is the recommended standard for aggregating English words to form a meaningful

identifier. For this metadata field, we excluded Java keywords such as 'class,' 'for,'

'new,' and 'void.'

The last piece of information to be parsed from code snippets is code comments.

Search engines can benefit from comments because they provide information about

the context of the code snippets and also contribute potential matches to search

terms. Having more keywords associated with a piece of source code will allow

users to have more changes that a keyword included in a query matches with a

keyword related to a piece of source code. Therefore, javadoc comments, line com-

ments, and block comments were collected and treated as textual information in

Juicy.

7 http://www.eclipse.org/articles/article.php?file=Article-JavaCodeManipulation_AST/index.html .

Next Page

Finding Source Code on the Web for Remix and Reuse

Search WWH ::

Custom Search

Home