Assessment of the Capabilities of Multimodal Large Language Models in Locating and Resolving Ambiguities during Human-Robot Teaming
Abstract
Human-robot teaming is bound by the quality of communication. This includes maintaining a context among the team. Our work studies the quality of ambiguity identification and resolution by Multimodal Large Language Models (MMLLMs) towards creating a clear context for teams. We developed a benchmark of images with associated ambiguous queries to replicate a teaming context with a human collaborator. We evaluated the performance of several MMLLMs on this benchmark to assess their capabilities in identifying and resolving ambiguities. We created a testing framework in which the MMLLM processes commands accompanied by an image and then evaluates the model's performance in detecting and resolving ambiguities. To create a shared context between our human and robot collaborators, our system provides a picture that captures the viewpoint of the robot as well as a query provided by the human collaborator. The chosen MMLLM processes this information and outputs both portions of the query that are ambiguous as well as suggestions for clarification. A corrected version of the prompt may then be sent to a planner or a system that provides actionable commands. To evaluate each MMLLM's performance, we compare the ambiguities identified by the model with the expected ambiguities from the datasets. We found an 81% accuracy for the top-performing MMLLM.
Keywords: Human-AI Collaboration, Ambiguity Resolution, Multimodal Large Language Models
DOI: 10.54941/ahfe1006048
Cite this paper
More from this volume
- Data-Driven Insights into Diabetes-Related Hospital Readmissions in the United States: Trends and Predictors
- A Sliding-Window Batched Framework: Optimizing Retrieval-Augmented Generation (RAG) for Trustworthy AI under the EU AI Act
- A Method of Structured Standard Terminology Based on Decoupling Approach
- Convo-Based Attitude Analysis of Twitter Big Data: A Case Study on Ukraine-Russia War Dataset
- Smart Cities: are they really accessible and truly smart?
- AI Optimization of Resolution Strategy in Utility Billing and Revenue Assurance
- Behavioural Intentions of Natural Farming Farmers to Adopt Digital Platforms for Purchasing Inputs: A Structural Equation Modeling-Based Multi-Group Analysis
- AIToys: A conceptual definition and future research agenda
- FITMag: A Framework for Generating Fashion Journalism Using Multimodal LLMs, Social Media Influence, and Graph RAG
- Challenges and Opportunities in E-commerce Distribution Networks in Johannesburg.
- Revolutionizing Logistics Management with Blockchain Technology
- Interpretable AI-Generated Videos Detection using Deep Learning and Integrated Gradients


AHFE Open Access