Image Processing Reference
In-Depth Information
3.2 Chinese Lexical Chain Processing
The fundamental idea of building CLCP is a botom-up concatenating process based on the
signiicance degree of distribution rate to extract the most meaningful LCs as anecodotes from
a string. We treated a news document as a long string composed of a series of characters and
punctuations.
Most of the traditional studies identify words from the whole context and store all the pro-
cessed tokens for further processing as Figure 2 . In this way, even a moderate-sized document
may require hundreds of thousands of tokens, which will consume lots of memory and may
incur unacceptable run-time overhead. Due to the number of distinct tokens processed is less
than that in the document, we adopted a sharing concept to allow reuse of the identical tokens.
We considered each character as a basic unit from which to build compounds as a composite,
which in turn can be grouped to form larger compounds. Since the character and compound
will be treated uniformly, it makes the application simple.
FIGURE 2 Traditional document processing.
By doing so, we adopted the concept of lyweight and composite design paterns proposed
by the GOF (Gang of Four) [ 29 ] to implement this design. Figure 3 shows the flyweight as a
shared object that can be used in the whole context simultaneously. Figure 4 represents the
part-whole hierarchy of texts and the way to use recursive composition. By applying flyweight
design patern, it supports the use of large numbers of ine-grained objects eiciently. By ap-
plying composite design patern, it makes the application easier to add new components.
FIGURE 3 Flyweight design concept.
FIGURE 4 Composite text structure.
The CLCP steps are depicted as Figure 5 , and Figure 6 displays a fractional code of the iter-
ative concatenating process. The detailed description with an example is addressed next.
FIGURE 5 CLCP steps.
 
 
 
 
Search WWH ::




Custom Search