Text Mining and Patient Severity Clusters - Text Mining Techniques for Healthcare Provider Quality Determination

Information Technology Reference

In-Depth Information

Figure 5. SAS text miner

Figure 6. Scoring all data

dx8,dx9,dx10,dx11,dx12,dx13,dx14,dx15);

Procedures=catx(' ',pr1,pr2,pr3,pr4,pr5,pr6,pr7,

pr8,pr9,pr10,pr11,pr12,pr13,pr14,pr15);

Run;

Then, we use the following diagram in SAS Enterprise Miner (Figure 5).

We generally use a sample of the data because of the amount of computer resources required to per-

form Text Miner. We can then score the remaining data, meaning that we use a small sample to define the

clusters and then we use the remaining data as test data (Figure 6). The metadata node is used to change

the definition of the cluster value as a target variable; the MBR node is used to define the prediction of

the cluster. The score node is used to define the cluster value for the remaining data.

In order to run the process shown in Figure 6, the data role for the scored data must be changed to

“score”. The Text Miner node has some defaults that need to be changed in order to work with ICD9

codes (Figure 7).

In particular, we must allow Text Miner to cluster using numbers, since numbers are all that are

available. Generally, the default is not to use numbers. We also don't want Text Miner to attempt to dis-

tinguish between different parts of speech since all of the ICD9 codes are nouns, and we don't want to

use noun groups. Once the singular value decomposition is completed, we can go ahead and cluster the

patient records. For a patient severity index, we want to define a small number of clusters, so we define

the total number of clusters exactly. In this case, we use a total of ten diagnosis clusters.

We can similarly cluster based upon procedures, although we have to make a slight modification in

the dataset, or we will overwrite the cluster numbers. We use the created dataset, emws.text_documents

and define a new dataset with the code:

Data nis.diagnosisclusters (drop=_SVD_1-_SVD_100 prob1-prob100);

Set emws.text_documents;

Search WWH ::

Custom Search

Home