Image Processing Reference
In-Depth Information
3.2 Chinese Lexical Chain Processing
The fundamental idea of building CLCP is a botom-up concatenating process based on the
signiicance degree of distribution rate to extract the most meaningful LCs as anecodotes from
a string. We treated a news document as a long string composed of a series of characters and
punctuations.
Most of the traditional studies identify words from the whole context and store all the pro-
cessed tokens for further processing as
Figure 2
. In this way, even a moderate-sized document
may require hundreds of thousands of tokens, which will consume lots of memory and may
incur unacceptable run-time overhead. Due to the number of distinct tokens processed is less
than that in the document, we adopted a sharing concept to allow reuse of the identical tokens.
We considered each character as a basic unit from which to build compounds as a composite,
which in turn can be grouped to form larger compounds. Since the character and compound
will be treated uniformly, it makes the application simple.
FIGURE 2
Traditional document processing.
By doing so, we adopted the concept of lyweight and composite design paterns proposed
shared object that can be used in the whole context simultaneously.
Figure 4
represents the
part-whole hierarchy of texts and the way to use recursive composition. By applying flyweight
design patern, it supports the use of large numbers of ine-grained objects eiciently. By ap-
plying composite design patern, it makes the application easier to add new components.
FIGURE 3
Flyweight design concept.
FIGURE 4
Composite text structure.
ative concatenating process. The detailed description with an example is addressed next.
FIGURE 5
CLCP steps.
Search WWH ::
Custom Search