Effective deep learning through bidirectional reading on masked language model

Open Access
Article
Conference Proceedings
Authors: Hiroyuki Nishimoto

Abstract: Google BERT is a neural network that is good at natural language processing. It has two major strategies. One is “Masked Language Model” to clear the word-level relationships, and the other is “Next Sentence Prediction” to clear sentence-level relationships. In the Masked Language Model, with the task of masking some words in sentences, BERT learns to predict the original word from context. Some questions come to mind. Why BERT achieves effective learning by reading in two ways from fore and back? What is the difference between the bidirectional reading? In the Masked Language Model with the task of masking some words, the middle sentence is “I ate [mask] every morning”. First, predict the masked word by forward reading. The previous sentence is “Considering my health, I decided to change the breakfast menu”. What is asked in general? We usually think about which is more realistic and reach “an apple”. The answer is “feasibility”.Next, predict the masked word by backward reading, focusing on the middle sentence and the post sentence “A month later, I lost 3 kg and became healthy.” What is asked in general? In such a situation, we are looking for success factors. We usually think about which is more relevant and reach “an apple”. The answer is causality. Therefore, BERT is learning to predict feasibility by forward reading and causality by backward reading.Besides, the bidirectional reading technique can be applied to scenario planning using back-casting from the future. Scenario planning is making assumptions on what the future is going to be. A scenario can be described in two ways, one is fore-casting and the other is back-casting. Fore-casting means viewing from the present to the future. In general, back-casting means viewing from the present to the past. But in this paper, it means viewing from the future to the present. Just as there are two different predictions for bidirectional reading, there is a big difference between fore-casting into the future and back-casting from the future. How do you feel about the first scenario using fore-casting? You tend to focus on feasibility. Therefore, a long debate starts about the feasibility, such as “Is it possible?”, “Is it difficult?”, and “How to achieve?”.On the other hand, how do you feel about the second scenario using back-casting from the future to the present? This scenario has to be written in past mode because of back-casting. If it is written in past mode, you feel it has done and someone has resolved all the problems by that time. The surrounding words in past mode change your feeling from prediction to event context. You tend to focus on causal factors of success. You can escape from the long debate.Scenario planning using back-casting from the future to the present makes a good proposition, except for long discussions. Besides, in terms of the mystery of deep learning, each answer lies in human thinking mechanism because AI is created by imitating the human brain.

Keywords: Masked Language Model, Bidirectional Reading, Back Casting From The Future

DOI: 10.54941/ahfe1001177

Cite this paper:

Downloads
101
Visits
234
Download