Databases Reference
In-Depth Information
Fig. 14.5 Results presentation in Juicy
1. Best Matched Paragraph. This is the paragraph with the highest frequency of
matched topic keywords between the paragraph and its related code snippet. This
paragraph can appear anywhere on the web page.
2. Text Segment Above Snippet. Text segments usually contain several paragraphs
and are bounded by two code snippets. These are identified during the snippet
extraction process.
3. Last Paragraph. We considered using the the last paragraph of a text segment,
which appears immediately above a code snippet.
4. Page Title + Text Segment. This candidate included the page title and the text
segment. This group provides the largest set of data related to a code snippet. We
considered this combination, because it was used to index the snippet.
Figure 14.6 shows the percentage of the 200 examples that had one or more
topics in common between the code snippet and the candidate text. The two can-
didates that contained the most text also had more topics, and consequently had a
higher percentage of matches. Eighty-three percent of the code snippets had top-
ics in common with Page Title + Text Segment, while 80.54 % of the Text Segment
did. The individual paragraphs, Best Match and Last Paragraphs, fared less well,
because they contained less text and fewer topics. The Best Matched Paragraph has
a 66.81 % of matched topics and the Last Paragraph has a 59.26 %. In the end, we
elected to use the Best Matched paragraph, because it had a reasonable combination
of descriptiveness and brevity.
Search WWH ::




Custom Search