Systems and Means of Informatics
2014, Volume 24, Issue 3, pp 204-217
USING HASH FUNCTION FOR INCREASING SPEED OF WORK OF THE SOFTWARE FOR MORPHOLOGICAL ANALYSIS OF RUSSIAN TEXTS
- N. V. Somin
- M. M. Sharnin
Abstract
The paper considers the problem of increasing efficiency of morphological analysis of Russian texts. The software system for morphological analysis is described, including the set of morphological characteristics and the algorithms of work. The paper mentions the software systems solving the problem of log^-semantic analysis of natural language texts in which the software system for morphological analysis found application. Features of the system are discussed from the point of view of occupied memory and work speed. The way of storage of morpholexical information using hash functions is suggested which provides high speed of access. The difficulties arising during realization of such approach are discussed and possible solutions are considered. The paper describes the structure of information arrays of the new version and the search algorithms realized in it. The paper also describes a subsystem for putting in and updating morphological information. Specific parameters of the new realization of the software system for morphological analysis and information on speed of work acceleration in comparison with the previous version are given. The paper discusses opportunities of development of the new version of the software system for morphological analysis and of transferring the suggested approach to other components of the linguistic processor.
[+] References (15)
- Segalovich, I., and M. Maslov. 1998. Russkiy morfologicheskiy analiz i sintez s ge- neratsiey modeley slovoizmeneniya dlya ne opisannykh v slovare slov [The Russian morphological analysis and synthesis with generation of models of word change for the words not described in the dictionary]. Conference Dialog'98 Proceedings. Kazan': Hjeter. 2:547-552.
- Kovalenko, A. 2002. Veroyatnostnyy morfologicheskiy analizator russkogo i ukrain- skogo yazykov [Probabilistic morphological analyzer of the Russian and Ukrainian languages]. Sistemnyy Administrator [System Administrator] 1. Available at: http://www.keva.ru/stemka/stemka.html (accessed August 12, 2014).
- Sokirko, A. V. 2004. Morfologicheskie moduli na sayte www.aot.ru [Morphological modules at the site www.aot.ru]. Conference Dialog'2004 Proceedings. Verkhnevolzh- skiy. Available at: http://www.aot.ru/docs/sokirko/Dialog2004.htm (accessed August 12, 2014).
- Somin, N. V., N. S. Solov'eva, E.V. Kuznetsova, and M. M. Sharnin. 2005. Sis- tema morfologicheskogo analiza: Opyt ekspluatatsii i modifikatsii [The System of morphological analysis: Operating experience and modifications]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 15:20-30.
- Kuznetsov, I. P., and N. V. Somin. 2007. Anglo-russkaya sistema izvlecheniya znaniy iz potokov informatsii v srede Internet [English-Russian languages system of extraction of knowledge from Internet]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 17:236-254.
- Kuznetsov, I. P., and A. G. Matskevich. 2007. Semantic-orientirovannye sistemy na osnove baz znaniy [Semantiko-focused systems on the basis of knowledge bases]. Moscow: MTUCI. 173 p.
- Somin, N. V., I. P. Kuznecov, A. G. Matskevich, and V. G. Nikolaev. 2009. Metody
i sredstva nastroyki morfo-leksicheskogo analizatora na predmetnuyu oblast' [Methods and means of setup of the morfolexical analyzer for subject domain]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 19:96-118.
- Kuznetsov, I. P., and N. V. Somin. 2010. Osobennosti leksiko-morfologicheskogo analiza pri izvlechenii informatsionnykh ob"ektov i svyazey iz tekstov estestvennogo yazyka [Features of the lexical-morphological analysis at extraction of information objects and communications from natural language texts]. Komp'yuternaya Lingvis- tika i Intellektual'nye Tekhnologii. Po mat-lam Mezhdunar. Konf. "Dialog 2010" [Computational Linguistics and Intellectual Technologies. Conference (International) Dialog 2010 Proceedings]. Moscow: RGGU. 9(16):254-264.
- Somin, N. V., I. P. Kuznetsov, V. G. Nikolaev, N. S. Solovyeva, and A. G. Matskevich. 2011. Metody ustraneniya neopredelennostey bloka leksiko-morfologicheskogo analiza pri izvlechenii znaniy iz tekstov estestvennogo yazyka [Methods to eliminate uncertainty of lexical-morphological analysis in process of knowledge extraction from natural language texts]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 21 (2):96-114.
- Kuznetsov, I. P., N. V. Somin, and E. B. Kozerenko. 2011. Osobennosti leksiko- morfologicheskogo analiza v zadachakh izvlecheniya struktur znaniy iz tekstov es- testvennogo yazyka [Features of the lexical-morphological analysis in problems of extraction of structures of knowledge from natural language texts]. Iskusstvennyy Intellekt (NAN Ukrainy) [Artificial Intelligence (National Academy of Sciences of Ukraine)] 3:105-116.
- Kuznetsov, I. P., andN. V. Somin. 2012. Vyyavlenie implitsitnoy informatsii iz tekstov na estestvennom yazyke: Problemy i metody [Extraction of implicit information from the natural language texts: Problems and methods]. Informatika i ee Primeneniya - Inform. Appl. 6(1):49-58.
- Solovyeva, N. S., and N. V. Somin. 1995. TERMIN-3 - sistema dinamicheskogo giperteksta [The system of dynamic hypertext]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 7:95-104.
- Dyagileva, A.V., S.L. Kiselev, and N. V. Somin. 1998. Statisticheskaya model' rubrikatsii tekstov na primere soobshcheniy SMI [The statistical model of a rubrication of texts on mass media examples]. Distantsionnoe Obrazovanie [Remote Education] 7:16-21.
- Somin, N. V., andN. S. Solovyeva. 2001. Rubritsirovanie tekstovkakinformatsionnaya tekhnologiya [Rubrication of texts as information technology]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 11: 195-201.
- Somin, N. V. 2005. Katalogi spetsial'nykh znaniy v sisteme analiza tekstov na es- testvennom yazyke [Catalogs of special knowledge in system of the analysis of texts in a natural language]. Problemy i Metody Informatiki. II Nauchnaya Sessiya IPI RAN. Tezisy dokladov. [Problems and Methods of Informatics. II Scientific Session of IPI RAN. Abstracts]. Moscow: IPI RAN. 128-131.
[+] About this article
Title
USING HASH FUNCTION FOR INCREASING SPEED OF WORK OF THE SOFTWARE FOR MORPHOLOGICAL ANALYSIS OF RUSSIAN TEXTS
Journal
Systems and Means of Informatics
Volume 24, Issue 3, pp 204-217
Cover Date
2013-11-30
DOI
10.14357/08696527140315
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
morphological analysis; hash function; linguistic processor; log^- semantic analysis of natural language texts
Authors
N. V. Somin and M. M. Sharnin
Author Affiliations
Institute of Informatics Problems, Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|