Informatics and Applications

2017, Volume 11, Issue 3, pp 60-72

IMPROVING CLASSIFICATION QUALITY FOR THE TASK OF FINDING INTRINSIC PLAGIARISM

  • I. O. Molybog
  • A. P. Motrenko
  • V. V. Strijov

Abstract

The paper addresses the classification problem in multidimensional spaces. The authors propose a supervised modification of the t-distributed Stochastic Neighbor Embedding Algorithm. Additional features of the proposed modification are that, unlike the original algorithm, it does not require retraining if new data are added to the training set and can be easily parallelized. The novel method was applied to detect intrinsic plagiarism in a collection of documents. The authors also tested the performance of their algorithm using synthetic data and showed that the quality of classification is higher with the algorithm than without or with other algorithms for dimension reduction.

[+] References (25)

[+] About this article