AI-powered real-time analysis of human activity in videos via smartphone.
Authors: Rico Thomanek, Benny Platte, Matthias Baumgart, Christian Roschke, Marc Ritter
Abstract: A major focus in computer vision research is the recognition of human activity based on visual information from audiovisual data using artificial intelligence. In this context, researchers are currently exploring image-based approaches using 3D CNNs, RNNs, or hybrid models with the intent of learning multiple levels of representation and abstraction that enable fully automated feature extraction and activity analysis based on them. Unfortunately, these architectures require powerful hardware to achieve the most real-time processing possible, making them difficult to deploy on smartphones. However, many video recordings are increasingly made with smartphones, so immediate classification of performed human activities and their tagging already during video recording would be useful for a variety of use cases. Especially in the mobile environment, a wide variety of use cases are therefore conceivable, such as the detection of correct motion sequences in the sports and health sector or the monitoring and automated alerting of security-relevant environments (e.g., demonstrations, festivals). However, this requires an efficient system architecture to perform real-time analysis despite limited hardware power. This paper addresses the approach of skeleton-based activity recognition on smartphones, where motion vectors of detected skeleton points are analyzed for their spatial and temporal expression rather than pixel-based information. In this process, the 3D-bone points of a recognized person are extracted using the AR framework integrated in the operating system and their motion data is analyzed in real time using a self-trained RNN. This purely numerical approach enables time-efficient real-time processing and activity classification. This system makes it possible to recognize a person in a live video stream recorded with a smartphone and classify the activity performed. By successfully deploying the system in several field tests, it can be shown both that the described approach works in principle and that it can be transferred to a resource-constrained mobile environment.
Keywords: artificial intelligence system, computer vision
Cite this paper: