ETL and ML Forecasting Modeling Process Automation System
Abstract
Given the importance of online retailers in the market, forecasting sales has become one of the essential market strategic considerations. Modern Machine Learning tools help in forecasting sales for many online retailers. These models need refinement and automatization to increase efficiency and productivity. Suppose an automated function can be applied to capture historical data and execute forecasting models automatically; it will reduce the time and human resources for the company to manage the forecasting system. An automated data processing and forecasting model system offers the marketing department more flexible market sales forecasting. Proposed here is an automated weekly periodic sales forecasting system that integrates: the Extract-Transform-Load (ETL) data processing process and machine learning forecasting model and sends the outcomes as messages. For this study, the data is obtained for an online women's shoe retailer from three data sources (AWS Redshift, AWS S3, and Google Sheets). The system collects the sales data for 120 weeks, then passes it to an ETL process, and runs the machine learning forecasting model to forecast the sales of the retailer's products in the next week. The machine learning model is built using the random forest regressor. The top 25 products with the most popular forecasting results are selected and sent to the owner’s email for further market evaluation. The system is built as a Directed Acyclic Graph (DAG) using Python script on Apache Airflow. To facilitate the management of the system, the authors set up Apache Airflow in a Docker container. The whole process does not require human monitoring and management. If the project is executed on Airflow, it will notify the project owner to inspect the cause of any potential error.
Keywords: Extract, Transform, Load (ETL) process, Machine Learning, Random Forest Regressor, Forecasting Model, Online Retailer, Apache Airflow, Docker, AWS
DOI: 10.54941/ahfe1003775
Cite this paper
More from this volume
- Explaining algorithmic decisions: design guidelines for explanations in User Interfaces
- Value-driven architecture enabling new interaction models in Society 5.0
- The Removal of Irrelevant Human Factors in a Multi-Review Corpus through Text Filtering
- Accounting trustworthiness requirements in Service Systems Engineering
- Analysis of the behavior of the floating systems used for boundary of river-sea recreational activities area
- A Data retrieval Model for Distributed Heterogeneous Pharmacy Information Sources
- Short-time taxi demand prediction based on Transformer-LSTM in integrated transportation hub
- Hackathon-based software development: Lessons learned from an internal corporate hackathon
- Improving Internet Advertising Using Click – Through Rate Prediction
- Crowdsourcing for Second Language Learning
- Evaluating embedded semantics for accessibility description of web crawl data
- Design of Library Management System Based on MVVM Framework and ZXing Scanning Code Technology


AHFE Open Access