Event Detection and Fusion Model for Semantic Interpretation of Monitored Scenarios within ASIMS Architecture - Foundations on Natural and Artificial Computation

Information Technology Reference

In-Depth Information

Furthermore, we aim for a multisensor system to diagnose a complex

scenario, which will require using all the different sensor capacities in a co-

operative/collaborative way, which will include mechanisms to interpret this

information together, and will make it possible to identify behaviours and situ-

ations by merging data from the different sensors in the different stages of the

system (INT3). We speak of diagnosing as a complementary stage to monitor-

ing, more usually located in immediate detection of the lower abstraction level

in interpretation [1].

Of course, in video-controlled environments, the camera sensor is of major

importance, although other types of sensors have processing stages very similar

to those of artificial vision (acquisition, segmentation, identification, tracking),

but what is presented in this work is generalisable to inputs from all sensor

types. Multiple sensor integration depends on the application and type of data

or signals used for the specific application. It is fundamental to bear in mind the

technical processing requirements of all and each of the video sources present

in the environment under surveillance. The computation requirements may in-

crease in large scenarios under surveillance: multiple cameras, visual and lighting

diculties, variety of sensors, etc. For this reason, strictly centralised process-

ing is not posed as the solution to the problem because it does not allow great

scalability. Conversely, embedded, distributed processing in nodes (that include

data capture) seems to be a possible solution to this problem and it has usually

been addressed with agents. [2]

A distributed video-monitoring system presents additional problems when

integrating the information from each of the monitoring nodes. On this point we

have redundant, contradictory and heterogeneous information, which is another

disadvantage of the approach. The solution to the problem calls for filtering,

merging and standardising the information from each observation and processing

node in the monitored environment. Here is the focal point of this work, to

provide a solution model.

Usually, to do a symbolic description of the scene from the video sequences,

movement is analysed [3,4]. The works by [5,6,7] focus on recognising events

related to vehicles and humans in the airport environment. In the continuation

of these works [8,9] a method is presented to recognise video events using a

tracking framework and Bayesian networks that use data on the surroundings

and trajectory. In [10] the composed events are analysed with hidden Markov

models. An activity is considered a composition of different “action threads”,

which are processed separately but related by specific time restrictions.

In our earlier works in this area, we started from an architecture true to the

model for abstraction layers [1], which communicate with each other bottom-up

(emergency) and top-down (feedback) (neurocumputing) in distributed multi-

sensor capture and processing nodes (CEDI) that communicate with each other

via events with a central nodel [11]. This central node processes the highest ab-

straction levels with declarative models [2,12], where an overall, more complete

view of the monitored scene is required. Our line of work aims to develop a

global architecture, which we call “Architecture for Semantic Interpretation of

Foundations on Natural and Artificial Computation

Search WWH ::

Custom Search

Home