Systems and Means of Informatics
2019, Volume 29, Issue 3, pp 52-65
CLUSTERING METHOD OF NEWS MEDIA REPORTS BASED ON CONCEPTUAL ANALYSIS
- V. N. Zakharov
- R. R. Musabaev
- A. M. Krasovitskiy
- Y. D. Kozlovskaya
- Al-dr A. Khoroshilov
- Al-ey A. Khoroshilov
Abstract
The article describes the solution of a clustering news media reports based on the technique developed by authors of automatic calculation of a measure of semantic meaningfulness of the names of concepts of documents using their statistical, syntactic, and semantic features and technologies of automatic generation of declarative means for clustering documents based on the methods of their semantic-syntactic and conceptual analysis. On the basis of the suggested technique of calculation of a measure of semantic meaningfulness of the names of concepts and the software and declarative means created by the study process, an experiment was conducted to process a representative array of news media reports. The analysis of the results showed that the use of semantic correlating coefficients of concepts improves the accuracy of establishing semantic similarity between documents at automatically establishing the semantic meaningfulness of textual names of concepts.
[+] References (10)
- Dobrov, B.V., and A.M. Pavlov. 2010. Issledovanie kachestva bazovykh metodov klasterizatsii novostnogo potoka v sutochnom vremennom okne [Basic line for news clusterization methods evaluation]. Elektronnyye biblioteki: perspektivnyye metody i tekhnologii, elektronnyye kollektsii: Trudy XII Vseross. nauchn. konf. [Digital Libraries: Advanced Methods and Technologies, Digital Collections: 12th All- Russian Scientific Conference Proceedings]. Kazan. 287-295. Available at: http:// rcdl.ru/doc/2010/287-295.pdf (accessed October 15, 2019).
- Kiselev, M. 2007. Metod klasterizatsii tekstov, osnovannyy na poparnoy blizosti termov, kharakterizuyushchikh teksty, i ego sravnenie s metricheskimi metodami klas- terizatsii [Text clustering procedure based on pairwise proximity of key terms and its comparison with metric clustering methods]. Internet-matematika 2007: sb. rabot uchastnikov konkursa nauchykh proektov po informatsionnomu poisku [Internet Mathematics 2007: Collection of works of the participants of the contest of scientific projects on information search]. Ed. P. I. Braslavskiy. Ekaterinburg: Ural University Publs. 74-83.
- Vasil'ev, V. G., and M. P. Krivenko. 2008. Metody avtomatizirovannoy obrabotki tekstov [Methods of automated word processing]. Ìoscow: IPI RAN. 304 p.
- Borzykh, A.I., G. A. Bragina, and A. A. Khoroshilov. 2012. Metody avtomaticheskoy klasterizatsii dokumentov v khranilishchakh nauchno-tekhnicheskoy informatsii dlya resheniya zadachi poiska plagiata v tekstakh dokumentov [Document automatic clusterization methods in science-technical information storages for plagiarism detecting in documents text problem solving]. Informatizatsiya i svyaz' [Informatization and Communication] 8:33-37.
- Parkhomenko, P. A., A. A. Grigorev, and N. A. Astrakhantsev. 2017. Obzor i eksperimental'noe sravnenie metodov klasterizatsii tekstov [A survey and an experimental comparison of methods for text clustering: Application to scientific articles]. Proceedings ISP RAN 29(2): 161-200. doi: 10.1551MSPRAS-2017-29(2)-6.
- Zakharov, V.N., and A. A. Khoroshilov. 2012. Avtomaticheskaya otsenka podobiya tematicheskogo soderzhaniya tekstov na osnove sravneniya ikh formalizovannykh smyslovykh opisaniy [Automatic assessment of similarity of the texts' thematic content on the base of their formalized semantic descriptions comparison]. Tr. XIV Vseross. nauchn. konf. "Elektronnyye biblioteki: perspektivnyye metody i tekhnologii, elektronnyye kollektsii" [Digital Libraries: Advanced Methods and Technologies, Digital Collections: 14th All-Russian Scientific Conference Proceedings]. Pereslavl-Zalessky. 189-195. Available at: http://ceur-ws.org/Vol-934/paper24.pdf (accessed July 17, 2019).
- Zakharov, V.N., and A. A. Khoroshilov. 2013. Avtomaticheskoe formirovanie vizual'nogo predstavleniya smyslovogo soderzhaniya dokumenta [Automatic generation of vizual representation of the document's semantic content]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 23 (1): 143-158.
- Khoroshilov, A. A. 2013. Metody avtomaticheskogo ustanovleniya smyslovoy blizosti dokumentov na osnove ikh kontseptual'nogo analiza [Methods for automatically establishing the semantic proximity of documents based on their conceptual analysis]. Trudy XV Vseross. nauch. konf. "Elektronnyye biblioteki: perspektivnyye metody i tekhnologii, elektronnyye kollektsii" [Digital Libraries: Advanced Methods and Technologies, Digital Collections: 15th All-Russian Scientific Conference Proceedings]. Yaroslavl': Demidov Yaroslavl' State University. 369-376.
- Zakharov, V., A. Krassovitskiy, Zh. Meirambekkyzy, I. Ualiyeva, Al-dr Khoroshilov, and Al-ey Khoroshilov. 2019. Automatic creation technologies of declarative tools for clustering media documents. Conference (International) on Engineering Technologies and Computer Science Proceedings. IEEE. 39-42. doi: 10.1109/EnT.2019.00013.
- Khoroshilov, Al-dr A., Yu. V. Nikitin, Al-ey A. Khoroshilov, and V. I. Budzko. 2014. Avtomaticheskoe sozdanie formalizovannogo predstavleniya smyslovogo soderzhaniya nestrukturirovannykh tekstovykh soobshcheniy SMI i sotsial'nykh setey [Automatic construction of a formalized representation of the semantic contents of unstructured texts of mass-media and social networks]. Sistemy vysokoy dostupnosti [High Availability Systems] 10(3):52-70.
[+] About this article
Title
CLUSTERING METHOD OF NEWS MEDIA REPORTS BASED ON CONCEPTUAL ANALYSIS
Journal
Systems and Means of Informatics
Volume 29, Issue 3, pp 52-65
Cover Date
2019-10-30
DOI
10.14357/08696527190305
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
text clustering; semantic-syntactic analysis; conceptual analysis; declarative means; statistical measure of meaningfulness of textual names of documents; semantic correlating coefficient; semantic similarity between documents
Authors
V. N. Zakharov , R. R. Musabaev ,
A. M. Krasovitskiy , Y. D. Kozlovskaya ,
Al-dr A. Khoroshilov , and Al-ey A. Khoroshilov
Author Affiliations
Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119133, Russian Federation
Institute of Information and Computational Technologies, 125 Pushkin Str., Almaty 050010, Kazakhstan
Moscow Aviation Institute (National Research University), 4 Volokolamskoe Shosse, Moscow 125993, Russian Federation
Institute of Informatics Problems, Federal Research Center "Computer Science
and Control", Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
The 27th Central Research Institute of the Ministry of Defence of the Russian Federation, 5, 1st Khoroshevsky Passage, Moscow 123007, Russian Federation
|