CLIP-Based Search Engine for Retrieval of Label-Free Images Using a Text Query
Abstract
In January 2021, OpenAI released the Contrastive Language-Image Pre-Training (CLIP) model, able to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the Internet. This model enables researchers to use natural language to reference learned visual concepts (or describe new ones), enabling the zero-shot transfer of the model to downstream tasks. One of the possible applications of CLIP is to look up images using natural language queries. This application is especially important in the context of the constantly growing amount of visual information created by people. This paper explores the application of the CLIP model to the image search problem. It proposes a practical and scalable implementation of the image search featuring the cache layer powered by SQLite 3 relational database management system (RDBMS) to enable performant repetitive image searches. The method allows efficient image retrieval using a text query when searching large image datasets. The method achieves 32.27% top-1 accuracy on the ImageNet-1k 1.28 million images train set and 55.15% top-1 accuracy on the CIFAR-100 10 thousand images test set. When applying the method, the image indexing time scales linearly with the number of images, and the image search time increases minorly. Indexing 50,000 images on Apple M1 Max CPU takes 19 minutes and 24 seconds while indexing 1,281,167 images on the same CPU takes 8 hours, 31 minutes, and 26 seconds. The query through 50,000 images on Apple M1 Max CPU executes in 4 seconds, while the same query through 1,281,167 images on the same CPU executes in 11 seconds.
Keywords: artificial intelligence, computer vision, natural language processing, image lookup, transformers
DOI: 10.54941/ahfe1004021
Cite this paper
More from this volume
- Human Risk-Informed Design Framework (HURID) for integrating human factors in the design of systems and operations
- On the development of an ergonomic approach for the design of an industrial robotic coworker
- AI Technology, Holocaust Survivors, and Human Interactions at Holocaust Museums
- Software Usability for Different Age Groups
- Post-Pandemic Impact Analysis for airport processes from security to boarding – How to respond to the next pandemic
- Human-centric decision for the Integrated Planning of Smart Port Systems
- Design and Evaluation of A Wearable Adaptable Setup System for Occupational Exoskeletons
- User Centered Design of a Digital Platform for Therapeutic Education and Respiratory Rehabilitation in Patients with Post-COVID-19
- Towards Smart Building: Visualization of Indoor CO2 Concentration. Adapting Modern Computational Tools for Informing Design Building Decisions
- The impact of automation frameworks on today's data science competencies
- Conceptual modeling for Human Systems Integration in Manned-Unmanned Teaming
- Sensing Intra-clothing Climate to Increase Comfort According to time, place, and occasion


AHFE Open Access