Case-Based Reasoning for Prognosis of Threatening Influenza Waves - Advances in Data Mining

Information Technology Reference

In-Depth Information

considerations, because no knowledge about it is available and learning by

comparison with desired results is already necessary for a later step (explained in

section 3.5.).

When comparing a current course with a former one, distances between equal

assessments are valued as 0.0, between neighbouring ones as 0.5, and otherwise as 1.0

(e.g. "increase" and "sharp increase" are neighbouring). Additionally we use weights;

the values for the short-term trend are weighted with 2.0, those for the medium-term

trend with 1.5, and those for the long-term trend with 1.0. The idea is that we believe

that more recent developments should be more important than earlier ones.

For the weekly data, we compute differences between the values of the query and

those of each former course. We compute an absolute difference between a value of

the query course and a value of a former course. Afterwards we divide the result by

the value of the query course and weight it with the number of the week within the

four weeks course (e.g. the first week gets the weight 1.0, the current week gets 4.0).

Finally, the distance concerning the trend assessments and the distance concerning

the incidences are added.

3.3

Sufficient Similarity Check

The result of computing distances is a very long list of all former four weeks courses

sorted according to their distances. For the decision whether a warning is appropriate,

this list is not really helpful, because most of the former courses are rather dissimilar

to the query course. So, the next step means to find the most similar ones. One idea

might be to use a fixed number, e.g. the first two or three courses in the sorted list.

Unfortunately, this has two disadvantages. First, even the most similar former course

might not be similar enough, and secondly, vice versa, the fourth, fifth etc. course

might be nearly as similar as the first one.

So, we decided to filter the most similar cases by applying sufficient similarity

conditions. So far, we use just two thresholds. First, the difference concerning the

three trend assessments between the query course and a most similar course has to be

below a threshold X. This condition guarantees similar changes on time. And secondly

the difference concerning the incidences of the current week must be below a

threshold Y. This second condition guarantees an equal current level. Of course

further conditions concerning the incidences of the three weeks ago might also be

used.

3.4

Adaptation

So, now we have got a usually very small list that contains only the most similar

former courses. However, the question arises how these courses can help to decide

whether early warning is appropriate. In Case-based Reasoning, the retrieval usually

provides just the most similar case whose solution has to be adapted to fit for the

query course. As in Compositional Adaptation [19] we take the solutions of a couple

of similar cases into account.

The question is: what are the solutions of courses of incidences? The obvious idea

is to treat the course continuation of a four weeks course as its solution. However, in

contrast to Viboud et al. [10] we do not intend to predict future incidences, but to

Advances in Data Mining

Search WWH ::

Custom Search

Home