Concept Trees: Building Dynamic Concepts from Semi-structured Data Using Nature-Inspired Methods - Complex System Modelling and Control Through Intelligent Soft Computations

Information Technology Reference

In-Depth Information

2.1 Types of Data Input

With regard to the text sequences considered in this paper, Greer ( 2011 ) describes

how a time element can be used to de

ne sequences of events that might contain

groups of concepts. A time stamp can record when the concept is presented to the

concept base, with groups presented at the same time being considered to be related

to each other. This is therefore built largely on the

of the system, where these

concept sequences could be recognised and learnt by something resembling a neural

network, for example. The uncertainty of the real world would mean that concept

sequences are unlikely to always be the same, and so key to the success is the

ability to generalise over the data and also to accommodate a certain level of

randomness or noise. The intention is that the neural network will be able to do this

relatively well. It is also true that there is a lot of existing structure already available

in information sources, but it might not be clear what the best form of that is. Online

datasets, for example, can be continuous streams of information, de

'

use

'

ned by time

stamps. While the data will contain structure, there is no clearly de

ned start or end,

but more of a continuous and cyclic list of information, from which clear patterns

need to be recognised.

As well as events, text might be presented in the form of static documents or

papers that need to be classi

ed. For the proposed system, there are some simple

answers to the problem of how to recognise the existing structure. The author has

also been working on a text-based processing application. One feature of the text

processor is the ability to generate sorted list of words from whole text documents.

Word lists can also appear as cyclic lists and patterns can again be recognised. This

current section of text, for example, is a list of words with nested patterns. In that

case, structure could be recognised as a sequence, ending when the word that started

the sequence is encountered again. To sort the text, each term in the sequence could

be assigned a count of the number of times it has occurred, as part of the sequence.

How many times does

for example, but a sequence can be

more than one word deep. Sequences that contain the same words, or overlap, can

be combined, to create the concept trees in the concept base. To select starting or

base words, for example, a bag-of-words with frequency counts can determine the

most popular ones. The decision might then be to read or process text sequences

only if they start with these key words. Pre-formatting or

'

tree

'

follow

'

concept

'

filtering of the text can

also be performed. Because this information would be created from existing text

documents, the process would be more semantic and knowledge-based. This does

not exclude the addition of a time element however and a global system would

bene

t from using all of these attributes.

The concept trees can then evolve, adding and updating branches as new

information is received. Processing just a few test documents however shows that

different word sorts of the original data will produce different sequences, from

which these basic structures are built, so the decision of correct structure is still

quite arbitrary. On the technical front, it might be more correct to always

use complete lists of concepts, as they are presented or received, and then try to

Complex System Modelling and Control Through Intelligent Soft Computations

Search WWH ::

Custom Search

Home