Information Technology Reference
In-Depth Information
3.2
Business Understanding
As mentioned, the main goal of this DM study is to predict the occurrence of
nosocomial infections in patients with the presence of certain risk factors. The DM
goal of this study is, therefore, the prediction through classification of patients and,
thus, it is necessary to use DM techniques to classify variables. Classification consists
on the prediction of a target variable that has different classes and maps elements of a
dataset in those predefined classes [4] [5].
At this stage, the problem to solve was formulated as the question “ What is the
probability of a patient not belonging to a group risk for the occurrence of
nosocomial infections when intrinsic risk factors or extrinsic risk factors are present
in his/her clinical condition? ”.
3.3
Data Understanding
Considering the question formulated, the data containing a relationship between the
variables in study and the variable associated with the nosocomial infection were
selected. In this study a dataset composed by nosocomial infection with data collected
between September 30 2013 and December 31 2013 was considered. Besides that, the
study considers only data from the Medicine Units of this CHP (specialties Medicine
A, Medicine B and Medicine C).
The attributes collected with the nosocomial infection forms did not had enough
quality or relevance to be used in the DM process and, so, a careful selection of
attributes was performed in order to choose the representative variables for the study.
The following variables were chosen:
Nosocomial Infection: variable that dictates the result of the diagnosis
simulated by the DM techniques and has two possible values, “Yes” or “No”;
Age, Sex, Clinical Specialty, Hospitalization Days: variables that characterize
the patient and his/her hospitalization;
Risk Factors: variable that represents the presence or the absence of any
intrinsic risk factor, such as alcoholism, diabetes, coma, malnutrition, etc.;
Urinary Catheter, Peripheral Catheter, Central Catheter, Nasogastric Intubation
and Nasotracheal Intubation: variables that represent the presence or absence
of the different invasive device, i.e. extrinsic risk factors, used during the
patient's hospitalization period.
3.4
Data Preparation
After selecting the data and variables to use in the induction of the prediction models,
it was necessary to perform the data pre-processing, a stage that allows to build a
dataset with the variables chosen. This stage of the CRISP-DM process reduces the
search space because it eliminates all the null and noise values presented in the data,
as well as the columns or lines without interest to the study, leaving the dataset only
with the records that are interesting for the study. After this stage, the dataset was
formed by 283 records, where only 26 were associated with the occurrence of a
Search WWH ::




Custom Search