Informatics and Applications

2020, Volume 14, Issue 3, pp 129-135

USING TOPIC MODELS FOR PAIRWISE COMPARISON OF COLLECTIONS OF SCIENTIFIC PAPERS

  • F. V. Krasnov
  • A. V. Dimentov
  • M. E. Shvartsman

Abstract

The authors propose a new technique for pairwise comparison of collections of scientific articles via a topic model. The developed methodology is called Comparative Topic Analysis (CTA). Comparative topic analysis allows getting not only quantitative assessment of similarity of collections but also structural differences of the compared text collections. The authors developed transparent visualization for text collections distance. This study compares existing approaches to topic modeling concerning the task of comparing collections of scientific papers. The authors consider probabilistic and generative topic models. The analysis of the requirements for text collections for the correct application of CTA was carried out. The CTA methodology has shown high efficiency in identifying structural differences in related collections. The authors developed an integral metric "Content Uniqueness Ratio" which allows comparing text collections with each other. As a result of the digital experiment, the thematic model with additive regularization (ARTM) proved to be the most informative.

[+] References (13)

[+] About this article