A Survey on Human Activities Recognition Models (Part-2)

Different Human Activities Recognition Models for Video Processing (contd.):
K. K. Htike, O. O. Khalifa, H. A. Ramli and M. A. Abushariah presented a new approach for human activity recognition for video processing by using sequences of postures [6]. The authors used one static camera for their approach. They used K-means, fuzzy C means, multilayer perceptron self-organizing maps and FFNNs classifiers in both training and testing phases. Then the accuracy rate is calculated. Maximum accuracy is achieved in the case of self-organizing maps. The researchers also mentioned that supervised learning classifiers outperform unsupervised classifiers. The recognition rate is directly proportional to the number of training sets. They compared their proposed model with other models and achieved better results.
H. Kuehne, J. Gall and T. Serre mentioned a new end-to-end generative method for human activity recognition [7]. They integrated the idea of Fisher vector-based visual representation with the structured temporal model. Fisher vector adds the advantage of front-end generative models (like Gaussian). They validated their technique for recognition and parsing into action units. The authors also analyzed various feature representations and their effect on the overall performance of the system. They used a HTK framework for the evaluation of their technique. The implementation shows that they are able to achieve significant gains in accuracy for both recognition and parsing. The said model works at its best when sufficient training set data are available.
A. K. Kushwaha and R. Srivastava proposed a new framework for human activities recognition from video sequences [8]. The said framework is invariant in nature and is categorized into three modules, those are:- 1) detection and location by background omission, 2) creation of view-invariant spatio-temporal templates, 3) view activities recognition through template matching. In this approach, the background is omitted by change detection and background modelling techniques. The creation of view-invariant templates is done. Moment invariants and Mahalanobis distance are considered by the researchers for template matching of different activities. They evaluated on their own dataset, KTH activities recognition datasets, i3Dpost dataset, MSR dataset, video web multiview dataset and WVU human activities recognition datasets. They concluded from their experiments that, their suggested model is efficient, flexible and robust in nature.
S. Maćkowiak, P. Gardziński, Ł. Kamiński and K. Kowalak introduced a new algorithm for human activities recognition on multiview video [9]. The main aim of this approach was to develop a system that will automatically detect human behaviours (such as medical emergencies) and call for emergency help. They picked a directed graphical model to implement the behaviours. The said graphical model can be based on propagation nets and dynamic Bayesian networks. They selected voxel reconstruction in order to reconstruct a 3D scene. The researchers demonstrated that the presented system can achieve human activities recognition. Further future works may be carried out to enhance the interaction between objects.
N. Robertson and I. Reid proposed a new technique for human activities recognition in video processing [10]. They accepted human activities as stochastic sequences and the activities are represented by a feature vector. The feature vector is responsible for detecting and monitoring trajectory information. The probabilistic search method is used for activities recognition and HMM is used to smoothen activities sequences. By considering the similarities between predefined HMMs, high-dimensional recognition is gained. The whole phenomenon is carried out by a hierarchy of actions. The higher levels of hierarchy use Bayesian networks whereas, the lower levels use the non-parametric sampling technique. On merging both, they developed a new framework.
M. S. Ryoo proposed a new approach for human activity recognition [11]. They considered the general behaviour of human activity prediction as a probabilistic approach of human activities from video datasets. The prime objective of this proposed method is to identify the unfinished activities in the early stage of recognition. Among some of the other applications, one of the important implementations of this technique is on surveillance systems. The researchers tried to resolve the activities prediction problem by implementing new approaches to the conventional prediction process. They tested their approach and concluded that their work provides better results in terms of detecting ongoing human activities at the very initial phase of the recognition process as compared to the all previously existing approaches.
M. M. Sardsehmukh, M. T. Kolte, P. N. Chatur and D. S. Chaudhari presented a new 3D video dataset to recognize human activities in video surveillance systems [12]. They considered both indoor and outdoor scenarios for video surveillance. With respect to different illumination, background and viewpoint, the proposed algorithm monitors the performance rate of activities recognition. The authors used complex datasets with RGB values which can be implemented in real-world scenarios. The mentioned datasets contain ten activities by eleven subjects with four illumination conditions and three viewpoints. They analyzed one hundred ten videos. The extended information about depth is helpful for performance evaluation in real-time applications. Future work can be done by implementing the presented method in more complex real-time scenarios.
C. Mani Sharma, A. K. Kushwaha, S. Nigam and A. Khare introduced a new algorithm based on template matching for human activities recognition in video processing [13]. They retrieved the foreground objects through statistical modelling in a given scenario. The authors merged MHI (Motion History Images) with spatial silhouettes for human activities recognition. They showed that the above-mentioned method can work efficiently when implemented in the KTH database and their own database. This approach performs well in case of static activities as well as dynamic activities. The researchers evaluated and found that their introduced approach achieves an 86.52% accuracy rate. Further future work can be done by extending this algorithm to a gainfully automatic video surveillance system.
R. C. Miguel and J. M. Molina proposed a hybrid approach to merge sparse classification and multi-view learning [14]. They used multiple view learning which is represented as a feature fusion approach in order to recognize human activities. Multiple cameras are used for this purpose. They adapted the idea to use 2D motion descriptors for enhanced predictive accuracy rate. This evaluated informations from multiple cameras. The authors included L1-regularized sequence classifiers. The researchers evaluated multiple configurations to achieve less computation time. They resulted better accuracy rate in the case of IXMAS dataset as compared to the other existing ones.
Conclusion:
In this survey paper, I have thoroughly studied, analyzed and compared various human activities recognition approaches. I have also identified the merits and demerits of every approach and their performances with respect to the other pre-existing human activities recognition systems.
Tags
technology gyan