Information Technology Reference
In-Depth Information
An Innovative Framework for Securing
Unstructured Documents
Flora Amato, Valentina Casola, Antonino Mazzeo, and Sara Romano
Dipartimento di Informatica e Sistemistica
University of Naples Federico II, Napoli, Italy
{ flora.amato,casolav,mazzeo,sara.romano } @unina.it
Abstract. The coexistence of both structured and unstructured data
represents a huge limitation for documents management in public and
private contexts. In order to identify and protect specific resources within
monolithic documents we have exploited the adoption of different tech-
niques aiming to analyze texts and automatically extract relevant in-
formation. In this paper we propose an innovative framework for data
transformation that is based on a semantic approach and can be adapted
in many different contexts; in particular, we will illustrate the applicabil-
ity of such a framework for the formalization and protection of e-health
medical records.
Keywords: Knowledge extraction, document transformation, fine-grain
document protection.
1
Introduction
The adoption of innovative systems for electronic document management is to-
day very common in many domains as, for example, e-government, e-health and
many other professional and social fields. Indeed, in such domains there is the
need to manage the coexistence of traditional not-structured documents, mainly
stored on paper or digital supports, with new, structured documents built with
modern open standards and technologies (opendoc, XML, etc.). In the prac-
tice, the innovative management and elaboration techniques are unuseful for
not-structured documents and, even if desirable, it is impossible to think of op-
erations like semantic search, knowledge extraction, text elaboration and so on,
unless a proper of unstructured data is performed.
Furthermore, people working in “traditional contexts” (medical, juridical,
humanistic etc.,) are not interested in structuring their data/documents (like
textual documents, e-mails, web pages, multimedia files) treating the document
production as a monolith block without understanding the possible improve-
ments of data management and protection when information are well structured.
For example, a lawyer writes juridical records containing both sensitive and non
sensitive information that can be physically accessible (read and/or modified)
by different actors of the law domain; he has no competence to understand that
information systems can enforce the data management and fine-grained access
 
Search WWH ::




Custom Search