Structural Feature Based Anomaly Detection for Packed Executable Identification - Computational Intelligence in Security for Information Systems

Information Technology Reference

In-Depth Information

implemented by means of an innovative software designed by our research group.

It takes as input a list of relevant words (those having higher TF-IDF value) and,

exploiting a domain thesaurus [8] for semantic relations identification, clusterize

them in concepts.

The resources identification of the Postprocessing module uses the classifica-

tion procedure offered by KNIME [9] workflow tool.

In order to illustrate the processing phases, let's consider a fragment of an

Italian medical record:

“La Signora si presenta con un anamnesi di precedenti ricoveri presso

differenti reparti di questo ospedale. Inquieta ed a tratti aggressiva,

manifesta un forte stato d'ansia e dolori allo stomaco. Vistalastoriaclinicadi

patologie ansiogene del paziente, le sono stati somministrati 10mg di Maalox.”

Although the example is formulated in Italian the concepts to whom the

relevant terms refer to, are indicated in English.

The fragment states that “the patient presents a history of previous admis-

sions in different departments of a hospital. Restless and aggressive, shows a

strong state of anxiety and stomach pain. Given the patient's anxiety-inducing

conditions, she was given 10mg of Maalox”.

Once the terms of this fragment were extracted by means of Preprocessing

module (for brevity sake this step is not described, nevertheless, the interested

reader can find details in [6]), the Transformation module extracts the rele-

vant terms using, as described above, statistical measure; all the terms having a

TF-IDF value over an established threshold are selected: “paziente”(4.1), “an-

sia”(4.2), “dolori allo stomaco”(3.8), “aggressiva” (3.1), “storia clinica”(4.8), and

“Maalox” (4.7). These terms are then linked to the synsets to which they be-

long. Each synset refers to a concept, and each concept is then associated to a

document section as summarized in Figure 3.

In our example we obtain the concepts associated to the extracted terms:

“Patient” ,“anxiety” and “stomach pain”, “aggressive”, “Patient History” and

“Maalox”. The relevant concepts are structured in RDF format and the list of

Fig. 2. Instanced architecture for E-Health documents processing

Search WWH ::

Custom Search

Home