Comparative Analysis of RGB-based Eye-Tracking for Large-Scale Human-Machine Applications
Authors: Nicholas Caporusso, Trung Cao, Brett Thaman
Abstract: Eye-Tracking (ET) has become an established technology that enables using an individual’s gaze as an input signal to support a variety of applications in the context of Human-Computer Interaction (HCI). To this end, sensing devices such as infrared (IR) or standard RGB cameras (e.g., webcams) acquire images of the user’s eyes. Then, image processing algorithms identify the pupils and evaluate their movement in real-time. In infrastructure-based ET, sensors are attached or incorporated into the display of a computer, and they are calibrated to detect the position of the user’s eyes over the surface of the screen. Also, in recent years, wearable ET devices have been introduced to support mobile scenarios. They utilize a set of cameras that simultaneously acquire the user’s eyes and record the observed scene (i.e., user’s point of view). Several systems use ET for user research, market studies, and enhancing input in hands-free contexts. However, prior research showed that IR sensors significantly outperform RGB cameras. As a result, current ET applications require the use of dedicated hardware, which limits the adoption and use of ET technology in the most common applications. Although several research studies focused on democratizing access to ET by improving the speed, accuracy, and reliability of RGB sensors, nowadays there are only a few systems available, and their performance has not been tested extensively. As a result, despite its applicability in a variety of scenarios and the potential of ET in improving HCI tasks, this technology has been integrated into a few applications, only.The goal of our work is to achieve reliable and high-performance ET using standard RGB cameras, to extend the potential user base of ET, and foster the development of large-scale HCI applications that do not require additional and dedicated hardware. In our research, we primarily focused on infrastructure-based ET. We analyzed currently available ET systems based on IR sensors and RGB cameras and we compared their performance with respect to a variety of settings (e.g., camera resolution, light conditions, and noise). In this paper, we present a detailed report of our findings, we describe the main issues and challenges in realizing ET using RGB cameras, and we address their root causes. Moreover, we present the result of a study in which we explored the use of Machine Learning (ML) for improving the accuracy of gaze tracking: we compare the performance of landmark detection algorithms, and we report their limitations in terms of accuracy, reliability, and speed. Furthermore, we introduce a novel ML-based image processing pipeline and calibration routine. Our proposed solution is based on a two-step processing that integrates landmark detection and relative pupil calibration to improve overall accuracy with an optimal trade-off in terms of speed. We present different camera resolution, feature selection, calibration, and training settings that produce results that are comparable to IR sensors. Finally, we describe the advantage of the proposed system and how it can be utilized to deploy robust ET applications that leverage standard RGB cameras instead of requiring dedicated hardware.
Keywords: Eye-Tracking, Human-Computer Interaction, Machine Learning
Cite this paper: