Predicting Nosocomial Infection by Using Data Mining Technologies - New Contributions in Information Systems and Technologies

Information Technology Reference

In-Depth Information

nosocomial infection. At this stage the data and variables were also aggregated in

class in order to modelling the problem in a more accurate way. The following classes

were created:

•

Age Class: aggregation of the patients' age in ranges that correspond to

different age groups;

•

Intubation: aggregation of all the invasive devices related to intubation in a

single class (nasogastric intubation and nasotracheal intubation);

•

Catheterization: aggregation of all the invasive devices related to

catheterization in a single class (Urinary Catheter, Peripheral Catheter and

Central Catheter).

Moreover, oversampling techniques was applied to the dataset in order to replicate

the data associated with the occurrence of a nosocomial infection. Thus it was

possible, to obtain a number of records associated with the occurrence of a

nosocomial infection approximate to the number of records associated with the non-

occurrence of an infection. This technique consists on the minority class (full set) data

replication in order to increase its weight and this is necessary because the classifiers

tend to produce more classification errors in the presence of minority classes [10]. In

the case of this work, this technique was applied because the difference between the

number of forms associated with the occurrence of a nosocomial infection and the

number of forms associated with the non-occurrence of an infection was very

significative. Thus, the meaning of the infection occurrences could get lost because of

its lower occurrence rate in the population to study. After the oversampling, the

dataset had 517 records. With this stage of the CRISP-DM three datasets were

created: a dataset without replicated data (Approach A), a dataset with replicated data

(Approach B) and a dataset with replicated data and the variable age aggregated into

classes (Approach C).

3.5

Modeling

In this study the Support Vector Machines (SVM) and the Naïve Bayes (NB) were the

classification techniques used to perform DM. These techniques were used to

automatically induce the classification models with Oracle Data Miner 1 , a SQL

Developer extension that allows to build, evaluate and apply DM models. Another

techniques were explored but the first results were not satisfactory.

SVM is a powerful algorithm that is based on the statistical learning theory and

find the best decision plans that split data into different sets, can be used to model

complex problems and has a great capacity of generalization of the model to new data

[8] [11].

NB is also based on conditional probabilities, makes predictions considering the

Bayes Theorem and it is very fast and scalable [11].

Considering the different chosen variables, several scenarios were considered to

build the models:

1 http://www.oracle.com/technetwork/database/options/

advanced-analytics/odm/dataminerworkflow-168677.html

New Contributions in Information Systems and Technologies

Search WWH ::

Custom Search

Home