Information Technology Reference
In-Depth Information
Furthermore, we aim for a multisensor system to diagnose a complex
scenario, which will require using all the different sensor capacities in a co-
operative/collaborative way, which will include mechanisms to interpret this
information together, and will make it possible to identify behaviours and situ-
ations by merging data from the different sensors in the different stages of the
system (INT3). We speak of diagnosing as a complementary stage to monitor-
ing, more usually located in immediate detection of the lower abstraction level
in interpretation [1].
Of course, in video-controlled environments, the camera sensor is of major
importance, although other types of sensors have processing stages very similar
to those of artificial vision (acquisition, segmentation, identification, tracking),
but what is presented in this work is generalisable to inputs from all sensor
types. Multiple sensor integration depends on the application and type of data
or signals used for the specific application. It is fundamental to bear in mind the
technical processing requirements of all and each of the video sources present
in the environment under surveillance. The computation requirements may in-
crease in large scenarios under surveillance: multiple cameras, visual and lighting
diculties, variety of sensors, etc. For this reason, strictly centralised process-
ing is not posed as the solution to the problem because it does not allow great
scalability. Conversely, embedded, distributed processing in nodes (that include
data capture) seems to be a possible solution to this problem and it has usually
been addressed with agents. [2]
A distributed video-monitoring system presents additional problems when
integrating the information from each of the monitoring nodes. On this point we
have redundant, contradictory and heterogeneous information, which is another
disadvantage of the approach. The solution to the problem calls for filtering,
merging and standardising the information from each observation and processing
node in the monitored environment. Here is the focal point of this work, to
provide a solution model.
Usually, to do a symbolic description of the scene from the video sequences,
movement is analysed [3,4]. The works by [5,6,7] focus on recognising events
related to vehicles and humans in the airport environment. In the continuation
of these works [8,9] a method is presented to recognise video events using a
tracking framework and Bayesian networks that use data on the surroundings
and trajectory. In [10] the composed events are analysed with hidden Markov
models. An activity is considered a composition of different “action threads”,
which are processed separately but related by specific time restrictions.
In our earlier works in this area, we started from an architecture true to the
model for abstraction layers [1], which communicate with each other bottom-up
(emergency) and top-down (feedback) (neurocumputing) in distributed multi-
sensor capture and processing nodes (CEDI) that communicate with each other
via events with a central nodel [11]. This central node processes the highest ab-
straction levels with declarative models [2,12], where an overall, more complete
view of the monitored scene is required. Our line of work aims to develop a
global architecture, which we call “Architecture for Semantic Interpretation of
Search WWH ::




Custom Search