Informatics and Applications

2024, Volume 18, Issue 2, pp 47-53

ON THE GENERATION OF SYNTHETIC FEATURES BASED ON SUPPORT CHAINS AND ARBITRARY METRICS WITHIN THE FRAMEWORK OF A TOPOLOGICAL APPROACH TO DATA ANALYSIS. PART 2. EXPERIMENTAL TESTING ON PHARMACOINFORMATICS PROBLEMS

  • I. Yu. Torshin

Abstract

Consideration of precedent relationships between features and a target variable in the form of sets of Boolean lattice elements indicates the possibility of generating synthetic features using metric distance functions. Approaches to (i) assessing the relevance ("informativeness") of metrics in relation to the problems being solved; (ii) generating; and (iii) selecting synthetic features that are more informative than the original feature descriptions are formulated. The results of topological analysis of 2400 samples of "molecule-property" data from ProteomicsDB made it possible to obtain fairly effective algorithms for predicting the properties of molecules (rank correlation in cross-validation is 0.90 ± 0.23). Using this sample of problems, metrics have been established that most often generate informative synthetic features: maximum Kolmogorov deviation, "oblique" distance, and Lp, Renyi, and von Mises metrics. To solve the studied set of problems, the advantage of polynomial correctors compared to neural network and random forest correctors is shown.

[+] References (9)

[+] About this article