Magnus Johnsson

Cognitive Scientist, Computer Scientist

magnus@magnusjohnsson.se

Home

Conferences

Associative Self-Organizing Map

Action Recognition

Together with Miriam Buonamente and Haris Dindo from RoboticsLab at the University of Palermo in Italy, and Zahra Gharaee and Peter Gärdenfors from Lund University I have developed a hierarchical neural network architecture that uses a hierarchy of Self-Organizing Maps (SOMs) to recognize actions. The system and its general design was originally conceived in 2012 with the aim of creating a system able to learn to both recognize the actions of people, guess their intentions as well as integrate with systems implementing other faculties in a cognitive architecture by the aid of an internal simulation mechanism employing associative self-organizing maps (A-SOMs). The system recognizes actions without having to segment the stream of input. Since 2012 we have done extensive research on how to get the action recognition system to work well in numerous variants and implementations categorizing both 2D movies of agents performing actions and sequences of sets of joint positions obtained from 3D cameras. Simultaneously, Miriam Buonamente, Haris Dindo and I have done extensive research, with promising results, on versions of the architecture that internally simulate the likely continuation of partly seen actions, by employing associative self-organizing maps. This is a biologically inspired way of achieving sequence completion or the completion of missing parts of patterns extended in time and a significant step towards the aim of providing the system with the ability to guess the intentions of the observed individuals. The use of A-SOMs also enables the elicitation of expectations across different modalities, although this has not yet been tested in practice with this action recognition architecture, which will help integrate it with systems implementing other faculties.

The basic architecture is composed of hierarchical layers of neural networks that self-organizes into topologically ordered representations of increasingly complex human activity features, together with a suitable preprocessing mechanism (which depends on the variant of the architecture) and some variant of a mechanism that transforms activity patterns unfolding over time into a spatial representation (which also depends on the variant of the architecture). The implementations have employed two such layers, one that develops a representation of the key postures (i.e. postures distinguishible with the resolution obtained by the number of neurons involved) and one that develops a topologically ordered representation of movements (with a length less or equal to an action, depending on the method employed to transform activity patterns unfolding over time into a spatial representation). By determining a time length for the spatial transformations of the movements suitable for the particular set of actions the architecture is supposed to recognize together with 'smoothing' of the output it is possible to obtain a continuous guessing of the ongoing action without segmentation. A further more biologically plausible development for the future would be to implement the spatial transformations by using recurrent connections or leaky integrators in the key posture representation.

The kind of preprocessing varies depending on the input source and the particular version of the architecture, but in the case when sets of joint positions extracted from the stream of depth images obtained by a 3D camera are used, a straightforward coordinate transformation into egocentric coordinates together with scaling is efficient to provide capturing angle and size/distance invariances.

The actual activity feature layers are implemented by the employment of SOMs (or recurrently connected A-SOMs if the ability for internal simulation is needed). By using SOMs (or A-SOMs) it is possible to obtain feature representations by unsupervised learning, which means decreased demands on the data available for training.

The first layer SOM performs dimensionality reduction and forms a representation of individual postures (or other features at the lower end, like the first or second order dynamics of postures or their combination) in a topologically ordered way (which means similar postures are represented close to each other). When the system receives input a trajectory of activity representing key postures or dynamics is unfolded during an action. Since sufficiently simliar (what counts as sufficiently similar depends on the number of neurons) postures are represented by the same neuron, a particular movement carried out at various speeds will elicit the same activity trajectory in the SOM (i.e. the same sequence of activated neurons). Thus time invariance is achieved. Since similar postures are represented close to each other in a similarity ordered way in the SOM, similar movements carried out by the acting agent are represented as similar trajectories (i.e. similar sequences of activated neurons) in the SOM.

The present activity together with a suitably long (the suitable length depends on and is optimized to the set of actions the system is trained to recognize) sequence of previous unique activity is transformed into an ordered vector representation before entering a second-layer SOM, which develops an ordered spatial representation of sequences that uniquely correponds to different actions.

The third-layer in the hierarchy consists of a neural network that learns to label the activations in the second-layer SOM with their corresponding action labels.

Below are some demo videos captured by Zahra Gharaee demonstrating action recognition:

Movie1. Action recognition with manual segmentation.

Movie2. Action recognition with determination of the object acted upon. This is a hybrid system composed of an implementation of the hierarchical SOM architecture and a non-neural system for the determination of the object acted upon not described here.

Movie3. Continuous guessing of the action based on the ongoing movement.

Movie4. Continuous guessing of the action based on the ongoing movement applied to the publicly available MSR repository. Notice how the system also makes reasonable guesses based on the movements of the agent even before the actions are completed.

Related Publications

Kock, E., Sarwari, Y., Russo, N., and Johnsson, M. (2021). Identifying cheating behaviour with machine learning. In the proceedings of SAIS 2021, Luleå, Sweden.

Gharaee, Z., Gärdenfors, P. and Johnsson, M. (2017). Online Recognition of Actions Involving Objects. Journal of Biologically Inspired Cognitive Architectures.

Gharaee, Z., Gärdenfors, P. and Johnsson, M. (2017). Online Recognition of Actions Involving Objects. In the proceedings of BICA 2017, Moscow, Russian Federation.

Gharaee, Z., Gärdenfors, P. and Johnsson, M. (2017). First and Second Order Dynamics in a Hierarchical SOM system for Action Recognition. Applied Soft Computing, 59, 574-585.

Gharaee, Z., Gärdenfors, P. and Johnsson, M. (2017). Hierarchical Self-Organizing Maps System for Action Classification. In the proceedings of the 9th International Conference on Agents and Artificial Intelligence (ICAART 2017), Porto, Portugal.

Gharaee, Z., Gärdenfors, P. and Johnsson, M. (2016). Action Recognition Online with Hierarchical Self-Organizing Maps. In the proceedings of the 12th International Conference on Signal-Image Technology and Internet-Based Systems (SITIS 2016), Naples, Italy, 538-544.

Buonamente, M., Dindo, H. and Johnsson, M. (2016). Hierarchies of Self-Organizing Maps for Action Recognition. Cognitive Systems Research.

Buonamente, M., Dindo, H. and Johnsson, M. (2015). Discriminating and Simulating Actions with the Associative Self-Organizing Map. Connection Science.

Buonamente, M., Dindo, H. and Johnsson, M. (2014). Action Recognition based on Hierarchical Self-Organizing Maps. In the proceedings of the International Workshop on Artificial Intelligence and Cognition (AIC 2014), Turin, Italy.

Buonamente, M., Dindo, H. and Johnsson, M. (2013). Simulating Actions with the Associative Self-Organizing Map. In the proceedings of the International Workshop on Artificial Intelligence and Cognition (AIC 2013), Turin, Italy.

Buonamente, M., Dindo, H. and Johnsson, M. (2013). Recognizing Actions with the Associative Self-Organizing Map. In the proceedings of the 24th International Conference on Information, Communication and Automation Technologies (ICAT 2013), Sarajevo, Bosnia and Herzegovina.

Johnsson, M., and Buonamente, M. (2012). Internal Simulation of an Agent`s Intentions. Proceedings of the Biologically Inspired Cognitive Architectures 2012, Palermo, Italy. 175-176, Springer, ISBN: 978-3-642-34273-8.

Buonamente, M., and Johnsson, M. (2012). Architecture to Serve Disabled and Elderly. Proceedings of the Biologically Inspired Cognitive Architectures 2012, Palermo, Italy. 365-366, Springer, ISBN: 978-3-642-34273-8.