Interactive Visualization for Human-in-the-Loop 3D-to-2D Pose Annotation
Open Access
Article
Conference Proceedings
Authors: Yike Zhang, Eduardo Davalos
Abstract: Aligning 3D objects with their poses in 2D images has traditionally relied on manual trial-and-error rendering, where annotators repeatedly adjust parameters until the object appears to match the scene. This process is not only slow and labor-intensive, but also cognitively demanding, leading to human fatigue and inconsistent results. The reliance on such tedious workflows makes it difficult to scale annotations across entire video sequences, while the increased likelihood of error limits the reliability of the generated data.To address this gap, we present an interactive 3D-to-2D visualization and annotation tool that aids in accurate human annotation of 3D object poses. To our knowledge, this is the first system that allows users to directly manipulate 3D objects within a 2D real-world scene, providing an intuitive 3D graphical user interface for annotating object positions and orientations. The tool integrates visual cues with spatial context to enable robust 6D pose annotation. By offering real-time visualization, depth estimation, and both single- and multi-object linked pose annotation, the proposed tool establishes a practical foundation for generating accurate pose data. By reducing the burden of manual trial-and-error and making pose annotation more intuitive, this tool advances human involvement in dataset generation, enabling researchers to more efficiently and accurately create the data needed to drive progress in AI and vision-based applications.The highlights of our proposed augmented reality 6D pose annotation interactive tool are summarized below:1. Immediate and Intuitive Feedback: The interactive visualization provides immediate, continuous feedback, reducing cognitive load and supporting users in forming a clear mental model of the 3D-2D alignment.2. Cognitive Support for 3D Reasoning: By making depth cues explicit, the system supports human perceptual limitations in interpreting 3D structure from 2D views, minimizing errors caused by ambiguity.3. Precision with Reduced Frustration: The single-object annotation mode enables focused, high-precision interaction, reducing task complexity and minimizing accidental misalignment.4. Linking Poses with Context Preservation: By linking multi-object poses in the annotation tool, the system maintains spatial consistency, helping users preserve context and avoid repetitive manual corrections. This reduces annotation fatigue and supports efficient workflows in complex scenes.This interactive tool is open-source and publicly available at https://github.com/InteractiveGL/vision6D.
Keywords: Interactive Annotation, 3D-to-2D Visualization, Pose Estimation, Augmented Reality Interfaces, Annotation Tools
DOI: 10.54941/ahfe1006895
Cite this paper:
Downloads
11
Visits
40


AHFE Open Access