Automated generation of synthetic person activity data for AI models training
Open Access
Article
Conference Proceedings
Authors: Dominik Breck, Max Schlosser, Rico Thomanek, Christian Roschke, Matthias Vodel, Marc Ritter
Abstract: Image and video analytic methods, such as the recognition of a person's activities on the basis of given image material, are of great importance both in research and in everyday life. For such complex methods, deep learning approaches are mostly used, which require training based on a high data foundation. The main problem with data sets used for these methods is the acquisition and complex annotation of video data suitable for training a model. Further problems arising from the use of real-world data lie in the non-compliance with basic data protection issues or the representation of one-sided ethnic groups. A qualitatively and quantitatively inferior data basis also reduces the quality of the resulting model. The problems mentioned above increase the need for large amounts of data, which are correctly annotated and at the same time suitable for real world scenarios. Datasets which contain information on the used camera settings or filmed subjects are particularly rare due to the subsequent traceability of such data.The approach presented in this paper describes a workflow that realizes the generation of synthetic video data with extensive annotations in a completely automated way through several steps. The focus is on a real-time application that captures video clips by filming virtual scenarios. For the time being, the application focuses on activities performed by people, which do not interact with other objects. The basis is provided by three-dimensional character models, which are placed in the digital environment. For the recording, animations, which are also inserted into the application, are played on the models. By arranging up to 100 virtual cameras in a hemisphere around the virtual person to be filmed, it is ensured that a recording is made from as many perspectives as possible. Metadata is stored for each recording, which includes the type of activity played, the camera settings used, how character bone points change because of the animation and data on the ethnicity or physical trait of the person being filmed. The application offers numerous configuration options via a graphical user interface and a command line tool for setting up the recording and the metadata generated for each video clip. Thus, it is additionally possible to change the background and lighting of the scene, insert virtual objects or adjust the speed of the animation.With the help of a technical-functional prototype, it was shown that thousands of annotated video recordings of human activities can be created in no time. Such video data can be further processed by established models for the recognition and analysis of anatomical bone points and additionally validated with the help of the stored metadata. The developed workflow can be flexibly incorporated into different phases of model building and is thus suitable for initial training as well as for the optimization of existing training processes. In the future, the real-time application could be used for other generic image and video generation procedures. One example could be generating numerous image files of 3D-objects, which are suitable for training object classification systems.
Keywords: synthetic data generation, video activity detection, image and video analysis, deep learning
DOI: 10.54941/ahfe1004182
Cite this paper:
Downloads
162
Visits
369