Classifier - Software Development Case Studies in Java

Java Reference

In-Depth Information

Note that if we have a sequence containing only one symbol, its informa-

tion content is zero. Actually in Equation 4.1 the frequency f i

is exactly 1

and the number of symbols, N , is 1. Substituting these values in Equation 4.1

we obtain 0:

1

I

=

1

·

log 2 (1)

=

1

·

log 2 (1)

=

1

·

0

=

0

1

i

=

Equation 4.3 Information content for the limit case

Given a test set of items T , the selection of s as splitting feature generates

a group of subsets of T : T s, 1 , ... , T s, M

, where M is the number of possible values

of feature s . We define the information content of feature s for the set T as:

M

|

T s , i

|

| T |

I s , T =

I T −

·

I T s , i

1

i

=

Equation 4.4 Information gain

That is, the information of the split feature s is the difference between the

information of the initial set of items ( I T

) and the weighted sum of the infor-

mation of the sets of items induced by the split feature.

4.2.2

Main features

We are now able to summarize all the main features emerging from the

problem analysis.

Classification . This is the main goal of the system: the system must be

able to assign a category to an item based on some criteria.

■

Classifier training . To fulfil the previous goal, the system must be able to

capture a set of criteria from an existing set of items.

■

Problem representation . The tool is problem-independent; this means

that the user should be allowed to represent the specific problem in terms

of items, features and categories.

■

Criteria representation . The outcome of the training must be represented

in a human-readable format, which can be checked by experts.

■

4.2.3

Test

The following functionalities need to be tested carefully:

The most important is the correct construction of the classifier from a set

of items. The correctness of the classifier can be tested checking whether

it assigns the expected category to items whose category is known.

■

It is also important to check that the internal representation of the classi-

fier is implemented correctly and that it can be represented in a readable

way.

■

Software Development Case Studies in Java

Search WWH ::

Custom Search

Home