Interactive Visualization for Human-in-the-Loop 3D-to-2D Pose Annotation
Abstract
Aligning 3D objects with their poses in 2D images has traditionally relied on manual trial-and-error rendering, where annotators repeatedly adjust parameters until the object appears to match the scene. This process is not only slow and labor-intensive, but also cognitively demanding, leading to human fatigue and inconsistent results. The reliance on such tedious workflows makes it difficult to scale annotations across entire video sequences, while the increased likelihood of error limits the reliability of the generated data.To address this gap, we present an interactive 3D-to-2D visualization and annotation tool that aids in accurate human annotation of 3D object poses. To our knowledge, this is the first system that allows users to directly manipulate 3D objects within a 2D real-world scene, providing an intuitive 3D graphical user interface for annotating object positions and orientations. The tool integrates visual cues with spatial context to enable robust 6D pose annotation. By offering real-time visualization, depth estimation, and both single- and multi-object linked pose annotation, the proposed tool establishes a practical foundation for generating accurate pose data. By reducing the burden of manual trial-and-error and making pose annotation more intuitive, this tool advances human involvement in dataset generation, enabling researchers to more efficiently and accurately create the data needed to drive progress in AI and vision-based applications.The highlights of our proposed augmented reality 6D pose annotation interactive tool are summarized below:1. Immediate and Intuitive Feedback: The interactive visualization provides immediate, continuous feedback, reducing cognitive load and supporting users in forming a clear mental model of the 3D-2D alignment.2. Cognitive Support for 3D Reasoning: By making depth cues explicit, the system supports human perceptual limitations in interpreting 3D structure from 2D views, minimizing errors caused by ambiguity.3. Precision with Reduced Frustration: The single-object annotation mode enables focused, high-precision interaction, reducing task complexity and minimizing accidental misalignment.4. Linking Poses with Context Preservation: By linking multi-object poses in the annotation tool, the system maintains spatial consistency, helping users preserve context and avoid repetitive manual corrections. This reduces annotation fatigue and supports efficient workflows in complex scenes.This interactive tool is open-source and publicly available at https://github.com/InteractiveGL/vision6D.
Keywords: Interactive Annotation, 3D-to-2D Visualization, Pose Estimation, Augmented Reality Interfaces, Annotation Tools
DOI: 10.54941/ahfe1006895
Cite this paper
More from this volume
- Warnings and Multilingual Audiences
- EAT Da Vinci 3.0_Translating Cinematic Narrative into Media Art Installation
- From Manual to Automated: Enhancing Inclusivity in Foreign Language Education with Technology
- The effect of multi-sensory physical experiences in daily emotional self-tracking service for emotion self-awareness
- Parametric generation based graphic design and spatial expression research
- Gender Stereotypes in Video Gaming: Impacts of Anxiety Levels, Verbal Communication, and Performance
- Exploring Usability And User-experience Metrics With A Novel AR App In The MASTERLY Project
- Drawing Dialogues Between Generative AI and Children with Autism: A Qualitative Study on the Externalization of “Understanding”
- Human-Centered Design of Integrated Food Service Management Systems: Reducing Cognitive Load in Resource-Constrained Kitchen Operations
- The Design Futures Art-driven (DFA) Method: Structuring Art-Tech Collaboration for Sustainable Future of Food System
- Increasing importance of Instinct
- Bridging the Privacy Gap: Stakeholder Solutions to Support Transparent Data Management Practices in Digital Health Research


AHFE Open Access