Developing AI Video Analysis Systems to Explore Human Behavior in Infant and Ethnographic Footage

Open Access
Conference Proceedings
Authors: Yuta OgaiYuto OnoYasushi NoguchiSayaka TohyamaHideaki KondoMasayuki Yamada

Abstract: The advancement of Information and Communication Technology (ICT) has enabled the storage of large volumes of video data. In recent years, research has focused on technologies for extracting this video data in formats suitable for specific purposes. For instance, it is possible to derive insights about developmental processes from daily life video data or to extract specific segments of ethnographic footage for artistic expression.Scholars stress the importance of analyzing daily videos of infants to understand their developmental processes, but watching all infants' daily videos would require an enormous amount of human effort. Therefore, we considered employing AI technology, which has advanced rapidly in recent years. This presentation will present two examples of our AI-based video analysis to demonstrate the possibilities and challenges involved.In our first study, focused on an infant, we examined the possibility of using object-detection, action-recognition, and caption-generation AI to detect infant movements for the purpose of developmental research and monitoring. The object-detection AI, YOLOv8, extracted images surrounding the area detected as a human in infant videos. The caption-generation AIs CATR and BLIP were then used on each image to evaluate whether they could detect infants and provide information about their behavior. SlowFast, an action-recognition AI, was also used to detect infant behavior in the videos. On the basis of the results of these studies of individual AIs, we will discuss the potential of combining them. Ethnographic video data presents similar challenges.Another example of using AI is the experiential video installation "Diverse and Universal Camera," a media artwork that uses AI to analyze ethnographic footage. This project employed SlowFast and YOLOv8 to develop a system that automatically labels the actions of people and objects in videos and enables efficient video retrieval for exhibitions of ethnographic footage archives.In these examples, AI tools process amounts of video data that are too large to be managed by humans, extracting parts of the video that merit human attention to provide a better understanding of human behavior. One challenge is that installing multiple cameras in a household to capture everyday situations often necessitates reducing the video resolution due to storage and network bandwidth constraints. Moreover, because of the need to cover wide areas, an infant frequently appears small in the video, and other objects commonly appear in it, such as family members and the infant's bedding and toys. Ethnographic footage is also difficult to analyze, as it is typically old, in black-and-white, and in low resolution. Furthermore, each segment of footage, shot with different themes, emphasizes varying subjects or objects.We are investigating methods of using AI to address these problems, such as cropping individual image segments, classification of video by generated captions, and diffusion in the time direction. We believe that these methods can be applied to other video types, significantly enhancing the potential for research that analyses everyday human behavior.

Keywords: AI video analysis, Infant Behavior, Ethnographic Footage

DOI: 10.54941/ahfe1004694

Cite this paper: