Building a Recommendation Engine: The XELOPES Library - Realtime Data Mining

Database Reference

In-Depth Information

Notice that MiningInputStream implements the interface MiningVectorSet

which is the most general container of mining vectors. MiningVectorSet contains

only the method getMetaData to access the basis of the attribute space. MiningIn-

putStream as container of countable sets of mining vectors is the most important

implementation of MiningVectorSet. However, some applications like reinforce-

ment learning (Sect. 12.2.2 ) require access to uncountable sets of mining vectors

such as all points of a domain or all unit vectors starting from one point. In this case,

the representation of the mining vector set requires a more abstract level such as by

functions of the boundary or geometric objects. We will return to this topic in

Sect. 12.2.2 ; at this point, we mention that MiningInputStream is sufficient for data

mining applications.

MiningInputStream contains a graded spectrum of data access methods

depending on its implementation. In the simplest case, the data matrix can be

traversed only once using a cursor-based approach using the method next . If the

reset method is supported, the cursor can be set at the initial position. This access

type is often supported by files and databases. In a more comfortable case, the cursor

can be moved arbitrary using the move method (e.g., for databases supporting

JDBC 2.0). Even more comfortable is the direct access to the data array of the

data matrix, if the matrix fits into memory (e.g., class MiningStoredData ).

The read method returns the mining vector at the current cursor position. Each

full implementation of MiningInputStream must at least support the next and read

methods. In addition, MiningInputStream may implement the interface MiningOut-

putStream to write data to the data source. Each mining input stream is reflective:

the method getSupportedStream returns all data access (and update) methods

supported by the current implementation.

The mining input stream concept is a direct consequence of the fact that almost

each data mining algorithm requires a data matrix as input. In the language of

CWM, we would say: the logical model of data mining is of the Classifier type.

Thus, MiningInputStream extends the CWM class Class.

The physical model describes the physical data source that is used for mining,

like a text file or a database. For the data mining process, the physical model must

be mapped to the logical one.

The physical model describes the physical data source that is used for mining,

like a text file or a database. For the data mining process, the physical model must

be mapped to the logical one Fig. 12.7 .

In XELOPES, this mapping is done by subclassing: different types of physical

data sources can be accessed through different mining input stream classes that

extend MiningInputStream . Important stream classes of XELOPES are listed in

Table 12.1 . Often, it is useful to write own resource classes which extend MiningIn-

putStream or one of its subclasses.

Notice that the last three streams are composed streams which take an arbitrary

mining input stream as input and apply a transformation and multidimensional

selection/ordering to the stream, respectively.

Realtime Data Mining

Search WWH ::

Custom Search

Home