Information Technology Reference
In-Depth Information
Fig. 1.
Document processing framework
not structured and contained in textual format (for example electronic or paper
document) without any support so that the contents could be machine-readable
and processable. At this aim we have defined a framework for document process-
ing for analyze texts and automatically extract relevant information, concepts and
complex relations organizing the originally not structured information in a struc-
tured fashion.
In Figure 1 a framework schema is depicted. It is composed of three main
blocks:
(i)
preprocessing module for extracting textual elements from docu-
ments in input;
(ii)
transformation module for applying on textual elements
a set of transformation rules, identified by the set of parameters configuration
as input;
(iii)
postprocessing module for providing a proper encoding for the
transformed textual elements in order to make use of them in different applica-
tion areas. We have formailzed the document processing framework as follow:
let
T
=
t
1
,t
2
, ..., t
k
}
be the set of data structure (textual documents, XML, RDF, etc.) and finally
let
O
=
{o
1
,o
2
, ..., o
s
}
{
t
1
,t
2
, ..., t
n
}
be the set of textual documents; let
T
∗
=
{
be the set of
targets
defined by parameters configura-
tions. Each target
o
i
represent, in turn, a set of parameters configuration aimed
at selecting the appropriate set of algorithms and techniques for documents
transformation according to the context in which the framework has to be in-
stantiated. Such parameters configuration could act on the three modules that
make up the framework. In particular, each module takes as input a subset of
configuration parameters that select specific algorithms and eventually data in-
puts for a selected algorithm. So let
A
=
{
α
1
,α
2
, ..., α
h
}
,
B
=
{
β
1
,β
2
, ..., β
l
}
and
C
=
be the input parameters of the preprocessing, transfor-
mation and postprocessing modules respectively, the set
O
canbedefinedas
O
{
γ
1
,γ
2
, ..., γ
r
}
T
∗
that
giving a target
o
i
and a document
t
j
implements a set of transformation rules
(identified by
o
i
) and provides as output an elements
t
j
⊆
A
×
B
×
C
. The framework is defined as a function
f
:
T
×
O
→
.
Search WWH ::
Custom Search