Informatics and Applications

2022, Volume 16, Issue 4, pp 63-72

TECHNOLOGY FOR CLASSIFICATION OF CONTENT TYPES OF E-TEXTBOOKS

A. V. Bosov
A. V. Ivanov

Abstract

The problem of automatic classification of the educational content of the e-learning system, represented by tasks or practical examples, is being solved. A promising direction in the development of e-learning systems is the assessment of the quality of educational content. Carrying out such an assessment is the rationale for the need to create an automated classifier. The main idea is to model the content with an object with two properties - a textual description in natural language and a set of formulas in the language of scientific computer layout TgX. Using tasks from the electronic textbook on the theory of functions of a complex variable, a data set was prepared and labeled in accordance with this model. Four text classification algorithms were trained - naive Bayes classifier, logistic regression, single-layer and multilayer feedforward neural networks. For these classifiers, a number of comparative experiments were carried out comparing the classification accuracy using text content only, formula content only, and the full model. As a result of the experiment, not only a formal comparison of the algorithms was carried out but also the fundamental advantage of the full model was shown. That is, when using both textual description and representation of formulas in the TjXlanguage, the classification accuracy significantly exceeds one-factor algorithms and confirms the readiness of the technology for practical application.

[+] References (36)

Bogolepova, S. V. 2021. Analiz opyta obucheniya (v rossiyskom vuze) v onlayn-formate v period pandemii [Analysis of students' online learning experience (at university in Russia) at the time of the pandemic]. Otechestvennaya i zarubezhnaya pedagogika [Domestic and Foreign Pedagogy] 1(4-77):107-122.
Kusainov, A. K. 2019. Otsenka kachestva sovremennoy uchebnoy literatury [Quality assessment of modern educational literature]. Tsennosti ismysly [Values and Meanings] 1(59):8-19.
Martyushova, Ya. G. 2018. Teoreticheskie osnovaniya konstruirovaniya elektronnykh uchebnikov dlya studentov tekhnicheskikh universitetov [Theoretical foundations of designing electronic textbooks for students of technical universities]. Otechestvennaya i zarubezhnaya pedagogika [Domestic and Foreign Pedagogy] 1(5-54):151-165.
Bosov, A. V., Ya. G. Martyushova, and A. V. Naumov. 2022. Vybor napravleniy otsenivaniya kachestva elektronnykh sredstv obucheniya dlya organizatsii uchebnogo protsessa vuza [Directions selection for assessing the quality of electronic learning tools for the organization of the educational process of the higher educational process]. Sibirskiy pedagogicheskiy zh. [Siberian Pedagogical J.] 2:54-63.
Zanibbi, R., D. Blostein, and J. R. Cordy. 2002. Recognizing mathematical expressions using tree transformation. IEEE T. Pattern Anal. 24(11):1455-1467.
Blostein, D., and A. Grbavec. 1997. Recognition of mathematical notation. Handbook of character recognition and document image analysis. Eds. P. S. P Wang and H. Bunke. World Scientific Publishing Co. 557-582.
Chan, K. F., and D. Y. Yeung. 2000. Mathematical expression recognition: A survey. Int. J. Doc. Anal. Recog. 3(1):3-15.
Guidi, F., and C. Sacerdoti Coen. 2016. A survey on retrieval of mathematical knowledge. Mathematics Computer Science 10(4):409-427.
Knuth, D. E. 1984. The TjXbook. Reading, MA: Addison- Wesley. 483 p.
Libbrecht, P., and E. Melis. 2006. Methods to access and retrieve mathematical content in activemath. Congress (International) on Mathematical Software Proceedings. Berlin, Heidelberg: Springer. 331-342.
Libbrecht, P., and E. Melis. 2006. Semantic search in LeActiveMath. 1st WebALT Conference and Exhibition Proceedings. Eindhoven, Holland. 97-109.
Miner R., and R. Munavalli. 2007. An approach to mathematical search through query formulation and data normalization. Workshop (International) on Mathematical Knowledge Management Proceedings. Heidelberg: Springer. 342-355.
Misutka, J., and L. Galambos. 2011. System description: Egomath2 as a tool for mathematical searching onwikipedia.org. Conference (International) on Intelligent Computer Mathematics Proceedings. Berlin, Heidelberg: Springer. 307-309.
Miller, B. R., and A. Youssef. 2003. Technical aspects of the digital library of mathematical functions. Ann. Math. Artif. Intel. 38(1):121-136.
Sojka, P, and M. Liska. 2011. Indexing and searching mathematics in digital libraries. Conference (International) on Intelligent Computer Mathematics Proceedings. Berlin, Heidelberg: Springer. 228-243.
Kohlhase, M., S. Anca, S. Jucovschi, A. G. Palomo, and I. A. Sucan. 2008. MathWebSearch 0.4: Asemantic search engine for mathematics. Available at: http://mathweb. org/projects/mws/pubs/mkm08.pdf (accessed November 17, 2022).
Hu, Х., L. C. Gao, Х. Y. Lin, Т. Zhi, Х. F. Lin, and J. B. Baker. 2013. Wiki-mirs: A mathematical information retrieval system for Wikipedia. 13th ACM/IEEE-CS Joint Conference on Digital Libraries Proceedings. 11-20.
Liu, H., X. Tian, B. Tian, F Yang, and X. Li. 2016. An improved indexing and matching method for mathematical expressions based on inter-relevant successive tree. J. Computer Communications 4(15):63-78.
Tian, X. 2017. A mathematical indexing method based on the hierarchical features of operators in formulae. 2nd Conference (International) on Automatic Control and Infor-mation Engineering Proceedings. Atlantis Press. 49-52.
Biryaltsev, E. V., A. M. Gusenkov, andO. N. Zhibrik. 2014. Nekotorye podkhody k razmetke estestvennonauchnykh tekstov, soderzhashchikh matematicheskie vyrazheniya [Some approaches to the markup of natural science texts containing mathematical expressions]. Uchenye zapiski Kazanskogo universiteta. Ser. Fiziko-matematicheskie nau- ki [Proceedings of Kazan University Physics and mathematics ser.] 156(4):133-148.
Gusenkov, A., P. Gusenkova, Y. Palacheva, andO. Zhibrik. 2018. Extended functionality of mathematical formulae search service. 12th Conference (International) on Advances in Semantic Processing. Eds. M. Spranger and P. Lorenz. IARIAXPS Press. 35-41.
Bityukov, Yu. I., and Ya. G. Martyushova. 2022. Reshenie zadach po teorii funktsiy kompleksnogo peremennogo [Solving problems on the theory of functions of a complex variable]. Moscow: MAI. 87 p.
Krasnov, M. L., A. I. Kiselev, and G. I. Makarenko. 2012. Funktsii kompleksnogo peremennogo: zadachi i primery s podrobnymi resheniyami [Functions of a complex variable: Problems and examples with detailed solutions]. Moscow: Librokom. 208 p.
Krasnov, M. L., A. I. Kiselev, and G. I. Makarenko. 2012. Operatsionnoe ischislenie. Teoriya ustoychivosti: zadachi i primery s podrobnymi resheniyami [Operational calculus. Stability theory: Problems and examples with detailed solutions]. Moscow: Librokom. 176 p.
McTear, M. F, Z. Callejas, and D. Griol. 2016. The conversational interface. Cham: Springer. 422 p.
Salton, G., and M. J. McGill. 1983. Introduction to modern information retrieval. New York, NY: McGraw-Hill. 448 p.
Minsky, M. 1961. Steps toward artificial intelligence. P. IRE 49(1):8-30.
McCallum, A., and K. Nigam. 1998. A comparison of event models for naive bayes text classification. Workshop on Learning for Text Categorization Proceedings. 752(1):41-48.
Cox, D. R. 1966. Some procedures connected with the logistic qualitative response curve. Research papers in probability and statistics. Ed. F N. David. London: Wiley. 55-71.
Theil, H. 1969. A multinomial extension ofthe linear logit model. Int. Econ. Rev. 10(3):251-259.
Hosmer, D.W, and S. Lemeshow. 1989. Applied logistic regression. New York, NY: Wiley. 307 p.
Hastie, T, R. Tibshirani, and J. Friedman. 2009. The elements of statistical learning. 2nd ed. New York, NY Springer. 533 p.
Galushkin, A. I. 1974. Sintez mnogosloynykh sistem raspoznavaniya obrazov [Synthesis of multilayer image recogni-tion systems]. Moscow: Energiya. 368 p.
Haykin, S. 2009. Neural networks and learning machines. 3rded. Upper Saddle River, NJ: Pearson Education. 906 p.
Van Rijsbergen, C. J. 1979. Information retrieval. 2nd ed. Butterworth-Heinemann. 208 p.
Naumov, A. V., A. S. Dzhumurat, and A. 0. Inozemtsev. 2014. Sistema distantsionnogo obucheniya matematicheskim distsiplinam CLASS.NET [Distance learning system for mathematical disciplines CLASS.NET]. Vestnik komp'yuternykh i informatsionnykh tekhnologiy [Herald of Computer and Information Technologies] 10:36-44.

[+] About this article