Information Technology Reference
In-Depth Information
Fig. 1. Document processing framework
not structured and contained in textual format (for example electronic or paper
document) without any support so that the contents could be machine-readable
and processable. At this aim we have defined a framework for document process-
ing for analyze texts and automatically extract relevant information, concepts and
complex relations organizing the originally not structured information in a struc-
tured fashion.
In Figure 1 a framework schema is depicted. It is composed of three main
blocks: (i) preprocessing module for extracting textual elements from docu-
ments in input; (ii) transformation module for applying on textual elements
a set of transformation rules, identified by the set of parameters configuration
as input; (iii) postprocessing module for providing a proper encoding for the
transformed textual elements in order to make use of them in different applica-
tion areas. We have formailzed the document processing framework as follow:
let T =
t 1 ,t 2 , ..., t k }
be the set of data structure (textual documents, XML, RDF, etc.) and finally
let O = {o 1 ,o 2 , ..., o s }
{
t 1 ,t 2 , ..., t n }
be the set of textual documents; let T =
{
be the set of targets defined by parameters configura-
tions. Each target o i represent, in turn, a set of parameters configuration aimed
at selecting the appropriate set of algorithms and techniques for documents
transformation according to the context in which the framework has to be in-
stantiated. Such parameters configuration could act on the three modules that
make up the framework. In particular, each module takes as input a subset of
configuration parameters that select specific algorithms and eventually data in-
puts for a selected algorithm. So let A =
{
α 1 2 , ..., α h }
, B =
{
β 1 2 , ..., β l }
and C =
be the input parameters of the preprocessing, transfor-
mation and postprocessing modules respectively, the set O canbedefinedas
O
{
γ 1 2 , ..., γ r }
T that
giving a target o i and a document t j implements a set of transformation rules
(identified by o i ) and provides as output an elements t j
A
×
B
×
C . The framework is defined as a function f : T
×
O
.
Search WWH ::




Custom Search