Text Simplification with Reinforcement Learning Using Supervised Rewards on Grammaticality, Meaning Preservation, and Simplicity

Akifumi Nakamachi, Tomoyuki Kajiwara, Yuki Arase
Osaka University


Abstract

We optimize rewards of reinforcement learn- ing in text simplification using metrics that are highly correlated with human-perspectives.  To address problems of exposure bias and loss-evaluation mismatch, text-to-text generation tasks employ reinforcement learning that rewards task-specific metrics. Previous studies in text simplification employ the weighted sum of sub-rewards from three perspectives: grammaticality, meaning preservation, and simplicity. However, the previous rewards do not align with human-perspectives for these perspectives. In this study, we propose to use BERT regressors fine-tuned for grammaticality, meaning preservation, and simplicity as reward estimators to achieve text simplification conforming to human-perspectives. Experimental results show that reinforcement learning with our rewards balances meaning preservation and simplicity. Additionally, human evaluation confirmed that simplified texts by our method are preferred by humans compared to previous studies.