The Removal of Irrelevant Human Factors in a Multi-Review Corpus through Text Filtering
Abstract
Generating a high-quality explainable summary of a multi-review corpus can help people save time in reading the reviews. With natural language processing and text clustering, people can generate both abstractive and extractive summaries on a corpus containing up to 967 product reviews (Moody et al. 2022). However, the overall quality of the summaries needs further improvement. Noticing that online reviews in the corpus come from a diverse population, we take an approach of removing irrelevant human factors through pre-processing. Apply available pre-trained models together with reference based and reference free metrics, we filter out noise in each review automatically prior to summary generation. Our computational experiments evident that one may significantly improve the overall quality of an explainable summary from such a pre-processed corpus than from the original one. It is suggested of applying available high-quality pre-trained tools to filter noises rather than start from scratch. Although this work is on the specific multi-review corpus, the methods and conclusions should be helpful for generating summaries for other multi-review corpora.
Keywords: natural language processing, text summarization, text filtering, automatic text filtering, score based filtering
DOI: 10.54941/ahfe1003766
Cite this paper
More from this volume
- Explaining algorithmic decisions: design guidelines for explanations in User Interfaces
- Value-driven architecture enabling new interaction models in Society 5.0
- Accounting trustworthiness requirements in Service Systems Engineering
- Analysis of the behavior of the floating systems used for boundary of river-sea recreational activities area
- A Data retrieval Model for Distributed Heterogeneous Pharmacy Information Sources
- Short-time taxi demand prediction based on Transformer-LSTM in integrated transportation hub
- Hackathon-based software development: Lessons learned from an internal corporate hackathon
- Improving Internet Advertising Using Click – Through Rate Prediction
- Crowdsourcing for Second Language Learning
- Evaluating embedded semantics for accessibility description of web crawl data
- ETL and ML Forecasting Modeling Process Automation System
- Design of Library Management System Based on MVVM Framework and ZXing Scanning Code Technology


AHFE Open Access