Automated Visual Story Synthesis with Character Trait Control
Abstract
Visual storytelling is an art form that has been utilized for centuries to communicate stories, convey messages, and evoke emotions. The images and text must be used in harmony to create a compelling narrative experience. With the rise of text-to-image generation models such as Stable Diffusion, it is becoming more promising to investigate methods of automatically creating illustrations for stories. However, these diffusion models are usually developed to generate a single image, resulting in a lack of consistency be- tween figures and objects across different illustrations of the same story, which is especially important in stories with human characters.This work introduces a novel technique for creating consistent human figures in visual stories. This is achieved in two steps. The first step is to collect human portraits with various identifying characteristics, such as gender and age, that describe the character. The second step is to use this collection to train DreamBooth to generate a unique token ID for each type of character. These IDs can then be used to replace the names of the story characters in the image-generation process. By combining these two steps, we can create controlled human figures for various visual storytelling contexts.
Keywords: Visual Storytelling, Stable Diffusion, DreamBooth, Story Synthesis
DOI: 10.54941/ahfe1003275
Cite this paper
More from this volume
- A machine learning approach for optimizing waiting times in a hand surgery operation center
- Automated Decision Support for Collaborative, Interactive Classification
- Dynamically monitoring crowd-worker's reliability with interval-valued labels
- Perceptions, attitudes and trust toward artificial intelligence — An assessment of the public opinion
- Artificial Empathy: Exploring the Intersection of Digital Art and Emotional Responses to the COVID-19 Pandemic
- Machine Reading Comprehension and Expert System technologies for social innovation in the drug excipient selection process
- Image Caption Generation of Arts: Review and Outlook
- Towards a Proper Evaluation of Automated Conversational Systems
- Does Imageable Language Make Your Tweets More Persuasive?
- Emotional Analysis of Candidates During Online Interviews
- Emotion Recognition from Speech via the Use of Different Audio Features, Machine Learning and Deep Learning Algorithms
- Evaluating the Effect of Time on Trust Calibration of Explainable Artificial Intelligence


AHFE Open Access