Informatics and Applications
2024, Volume 18, Issue 2, pp 47-53
ON THE GENERATION OF SYNTHETIC FEATURES BASED ON SUPPORT CHAINS AND ARBITRARY METRICS WITHIN THE FRAMEWORK OF A TOPOLOGICAL APPROACH TO DATA ANALYSIS. PART 2. EXPERIMENTAL TESTING ON PHARMACOINFORMATICS PROBLEMS
Abstract
Consideration of precedent relationships between features and a target variable in the form of sets of Boolean lattice elements indicates the possibility of generating synthetic features using metric distance functions. Approaches to (i) assessing the relevance ("informativeness") of metrics in relation to the problems being solved; (ii) generating; and (iii) selecting synthetic features that are more informative than the original feature descriptions are formulated. The results of topological analysis of 2400 samples of "molecule-property" data from ProteomicsDB made it possible to obtain fairly effective algorithms for predicting the properties of molecules (rank correlation in cross-validation is 0.90 ± 0.23). Using this sample of problems, metrics have been established that most often generate informative synthetic features: maximum Kolmogorov deviation, "oblique" distance, and Lp, Renyi, and von Mises metrics. To solve the studied set of problems, the advantage of polynomial correctors compared to neural network and random forest correctors is shown.
[+] References (9)
- Torshin, I. Yu. 2024. O porozhdenii sinteticheskikh priznakov na osnove opornykh tsepey i proizvol'nykh metrik v ramkakh topologicheskogo podkhoda k analizu dannykh. Chast' 1. Vklyuchenie v formalizm empiricheskikh funktsiy rasstoyaniya [On the generation of synthetic features based on support chains and arbitrary metrics within a topological approach to data analysis. Part 1. Inclusion of empirical distance functions into the formalism]. Infor- matika i ee Primeneniya - Inform Appl. 18(1):71-77. doi: 10.14357/19922264240110. EDN: RIVOXR.
- Deza, E. I., and M. M. Deza. 2006. Dictionary of distances. North-Holland: Elsevier. 412 p. doi: 10.1016/B978-0-444- 52087-6.X5000-8.
- Torshin, I. Yu., and K. V Rudakov. 2017. Combinatorial analysis of the solvability properties of the problems of recognition and completeness of algorithmic models. Part 2: Metric approach within the framework of the theory of classification of feature values. Pattern Recognition Image Analysis 27(2):184-199. doi: 10.1134/S1054661817020110.
- Torshin, I. Yu. 2023. O formirovanii mnozhestv pretsedentov na osnove tablits raznorodnykh priznakovykh opisaniy metodami topologicheskoy teorii analiza dannykh [On the formation of sets of precedents based on tables of heterogeneous feature descriptions by methods of topological theory of data analysis]. Informatika i ee Primeneniya - Inform Appl. 17(3):2-7. doi: 10.14357/19922264230301. EDN: AQEUYO.
- Torshin, I. Yu., and K. V. Rudakov. 2019. On the procedures of generation of numerical features over partitions of sets of objects in the problem of predicting numerical target variables. Pattern Recognition Image Analysis 29(4):654-667. doi: 10.1134/S1054661819040175.
- Torshin, I. Y., and K. V. Rudakov. 2017. Combinatorial analysis of the solvability of the problems of recognition, completeness of algorithmic models. Part 1: Factorization approach. Pattern Recognition Image Analysis 27(1):16-28. doi: 10.1134/S1054661817010151.
- Sosa-Cabrera, G., S. Gymez-Guerrero, M. Garcia-Torres, and C. E. Schaerer. 2024. Feature selection: A perspective on inter-attribute cooperation. Int. J. Data Science Analytics 17:139-151. doi: 10.1007/s41060-023-00439-z.
- Torshin, I.Y. 2013. Optimal dictionaries of the final information on the basis of the solvability criterion and their applications in bioinformatics. Pattern Recognition Image Analysis 23(2):319-327. doi: 10.1134/ S1054661813020156.
- Torshin, I. Yu. 2023. O zadachakh optimizatsii, voznikayushchikh pri primenenii topologicheskogo analiza dannykh k poisku algoritmov prognozirovaniya s fiksirovannymi korrektorami [On optimization problems arising from the application of topological data analysis to the search for forecasting algorithms with fixed correctors]. Informatika i ee Primeneniya - Inform Appl. 17(2):2-10. doi: 10.14357/19922264230201. EDN: IGSPEW
[+] About this article
Title
ON THE GENERATION OF SYNTHETIC FEATURES BASED ON SUPPORT CHAINS AND ARBITRARY METRICS WITHIN THE FRAMEWORK OF A TOPOLOGICAL APPROACH TO DATA ANALYSIS. PART 2. EXPERIMENTAL TESTING ON PHARMACOINFORMATICS PROBLEMS
Journal
Informatics and Applications
2024, Volume 18, Issue 2, pp 47-53
Cover Date
2024-06-20
DOI
10.14357/19922264240207
Print ISSN
1992-2264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
topological data analysis; lattice theory; algebraic approach of Yu. I. Zhuravlev; pharmacoinformatics
Authors
I. Yu. Torshin
Author Affiliations
Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|