Comparison of AI Model Serving Efficiency: Response Time and Memory Usage Analysis

AHFE International

Accelerating Open Access Science in Human Factors Engineering and Human-Centered Computing

Comparison of AI Model Serving Efficiency: Response Time and Memory Usage Analysis

Open Access

Article

Conference Proceedings

Authors: Ji Yeon Kim, Seong Hyeon Jo, Ha Sang-hyun, Ki Hwan Kim, Young Jin Kang, Seok Chan Jeong

Abstract: NLP (Natural Language Processing) models are in increasing demand, making the research into effective serving methods crucial. Particularly, cost efficiency and rapid response times are key factors in the serving process of NLP models. This paper compares various methods for optimizing the serving of NLP models. Three serving methods were applied using REST API, TensorFlow Serving, and TensorFlow.js, and each method's response speed and memory usage were evaluated. This research is thought to provide foundational guidelines for enhancing the efficiency of serving NLP models, aiming to minimize potential issues in the serving process and improve user experience through such studies.

Keywords: TensorFlow Serving, TensorFlow.js, Nodejs, LSTM, GRU

DOI: 10.54941/ahfe1005580

Cite this paper:

Downloads

230

Visits

699