The Removal of Irrelevant Human Factors in a Multi-Review Corpus through Text Filtering

Open Access
Conference Proceedings
Authors: Aaron MoodyMakenzie SpurlingChenyi Hu

Abstract: Generating a high-quality explainable summary of a multi-review corpus can help people save time in reading the reviews. With natural language processing and text clustering, people can generate both abstractive and extractive summaries on a corpus containing up to 967 product reviews (Moody et al. 2022). However, the overall quality of the summaries needs further improvement. Noticing that online reviews in the corpus come from a diverse population, we take an approach of removing irrelevant human factors through pre-processing. Apply available pre-trained models together with reference based and reference free metrics, we filter out noise in each review automatically prior to summary generation. Our computational experiments evident that one may significantly improve the overall quality of an explainable summary from such a pre-processed corpus than from the original one. It is suggested of applying available high-quality pre-trained tools to filter noises rather than start from scratch. Although this work is on the specific multi-review corpus, the methods and conclusions should be helpful for generating summaries for other multi-review corpora.

Keywords: natural language processing, text summarization, text filtering, automatic text filtering, score based filtering

DOI: 10.54941/ahfe1003766

Cite this paper: