Eng | Rus

“Informatics and Applications” scientific journal

Volume 11, Issue 3, 2017

Content   Abstract and Keywords   About Authors

ANALOGS OF GLESER’S THEOREM FOR NEGATIVE BINOMIAL AND GENERALIZED GAMMA DISTRIBUTIONS AND SOME OF THEIR APPLICATIONS
  • V. Yu. Korolev Faculty of Computational Mathematics and Cybernetics, M. V. Lomonosov Moscow State University, 1-52 Lenin- skiye Gory, Moscow 119991, GSP-1, Russian Federation; Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation; Hangzhou Dianzi University, Higher Education Zone, Hangzhou 310018, China

Abstract: It is proved that the negative binomial distributions with the shape parameter less than one are mixed geometric distributions. The mixing distribution is written out explicitly. Thus, the similar result of L. Gleser, stating that the gamma distributions with the shape parameter less than one are mixed exponential distributions, is transferred to the discrete case. An analog of Gleser’s theorem is also proved for generalized gamma distributions. For mixed binomial distributions related to the negative binomial laws with the shape parameter less than one, the case of a small probability of success is considered and an analog of the Poisson theorem is proved. The representation of the negative binomial distributions as mixed geometric laws is used to prove limit theorems for negative binomial random sums of independent identically distributed random variables, in particular, analogs of the law of large numbers and the central limit theorem. Both cases of light and heavy tails are considered. The expressions for the moments of limit distributions are obtained. The obtained alternative equivalent mixture representations of the limit laws provide better understanding of how mixed probability (Bayesian) models are formed.

Keywords: negative binomial distribution; mixed geometric distribution; generalized gamma distribution; stable distribution; Laplace distribution; Mittag-Leffler distribution; Linnik distribution; mixed binomial distribution; Poisson theorem; random sum; law of large numbers; central limit theorem

SEGMENTATION OF NONSTATIONARY SIGNALS USING STOCHASTIC CHARACTERISTICS OF THE WINDOW VARIANCE
  • M. A. Dranitsyna Department of Mathematical Statistics, Faculty of Computational Mathematics and Cybernetics, M.V. Lomonosov Moscow State University, 1-52 Leninskiye Gory, GSP-1, Moscow 119991, Russian Federation
  • T. V. Zakharova Department of Mathematical Statistics, Faculty of Computational Mathematics and Cybernetics, M.V. Lomonosov Moscow State University, 1-52 Leninskiye Gory, GSP-1, Moscow 119991, Russian Federation; Institute of Informatics Problems, Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: Signal or response partitioning (i. e., signal segmentation) is of great interest, e. g., for biomedical research. Signal segmentation, being an essential part ofsignal processing, may serve as a tool for advanced signal interpretation and data classification. Segmentation of nonstationary signals with a small signal-to-noise ratio is a particulary complicated task. The paper is mainly devoted to exploration of the window variance noise component as a random variable for the proposed signal models. Some stochastic characteristics of the window variance noise components are investigated in accordance with the models. Theoretical findings are consistent with the previously obtained empirical characteristics of the window variance noise component and are supposed to be of potential use for signal segmentation and prediction.

Keywords: window variance; signal model

SUPERVISED LEARNING CLASSIFICATION OF INCOMPLETE CLINICAL DATA
  • M. P. Krivenko Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The article examines the effectiveness of classification methods for incomplete clinical data. Training Bayesian classifier is carried out by the maximum likelihood method for the model of a mixture of normal distributions. Rigorous derivation of formulas ensuring the realization of the steps of the EM algorithm allowed correctly applying the iterative process of obtaining estimates of the parameters of the mixture. For incomplete data, methods for selecting initial values and correcting degenerate covariance matrices for the elements of the mixture are proposed. The experimental part of the work consisted in analyzing the dependence of the quality of classification on the number of missing individual values, using data on enzymes obtained for patients with liver diseases. The real data treatment has demonstrated almost identical classification errors when applying simple and complex methods of processing of missing values in the case of low number of randomly missing individual values.

Keywords: missing data; EM algorithm; mixtures of normal distributions

COMPUTER MODEL OF SYNERGY OF TEAM DECISION-MAKING
  • I. A. Kirikov Kaliningrad Branch of the Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 5 Gostinaya Str., Kaliningrad 236000, Russian Federation
  • A. V. Kolesnikov Kaliningrad Branch of the Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 5 Gostinaya Str., Kaliningrad 236000, Russian Federation; Immanuel Kant Baltic Federal University, 14 A. Nevskogo Str., Kaliningrad 236041, Russian Federation
  • S. V. Listopad Kaliningrad Branch of the Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 5 Gostinaya Str., Kaliningrad 236000, Russian Federation

Abstract: The problems of the practice of complex sociotechnical systems management are characterized by a variety of NOT-factors (following the terminology suggested by A. S. Narinyani) that hamper their solution. Traditionally, teams of experts under the leadership of a decision-maker are involved in such problems to deal with the heterogeneity of information and the dynamic nature of the problem. For the same reason, the modeling of team processes in decision support systems is important for the automated solving of complex problems. The article deals with the issues of modeling the process of collective solving of complex problems and the resulting synergy effect, when an integrated solution is better than any decision ofexperts working individually.

Keywords: small team of experts; synergy; hybrid intelligent multiagent system

METHODS OF CATEGORY THEORY IN MODEL-BASED SYSTEMS ENGINEERING
  • S. P. Kovalyov Institute of Control Sciences, Russian Academy of Sciences, 65 Profsoyuznaya Str., Moscow 117997, Russian Federation

Abstract: A mathematical device based on the category theory is proposed to formally describe and rigorously explore procedures of employing models in engineering that constitute the contents of model-based systems engineering (MBSE). The essence of the device consists in mathematical representation of assembly drawings (megamodels of systems) as diagrams in categories whose objects are models, and morphisms represent actions associated with assembling system models from component models. The soundness of the device is justified on the basis of standards that govern description of the systems’ structure such as IEC 81346. Category-theoretical methods for solving a number of practical problems of assembling systems are proposed and explored. Examples of solving such problems are provided in categories that represent two key application areas for MBSE: geometric modeling of complex shapes and discrete-event simulation of the behavior of industrial systems.

Keywords: model-based systems engineering; megamodel; category theory; colimit

ON EFFICIENCY OF THE HIERARCHICAL ALGORITHM FOR SEARCHING APPROXIMATE NEAREST NEIGHBOR IN A GIVEN SET OF IMAGES
  • M.M. Lange Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • S.N. Ganebnykh Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • A.M. Lange Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The efficiency of the hierarchical algorithm for searching approximate nearest neighbor in a given set of images subject to an unwarranted error about the nearest image is investigated. The algorithm uses a space of quad pyramidal image representations as well as a guided search strategy in successive representation levels of increasing resolution. The efficiency is studied in terms of both an empirical distribution of search errors and computational complexity of the hierarchical algorithm relative to the exhaustive search. The above characteristics are obtained for two applications, namely, search for approximate nearest image in a set of hand-written digits from the MNIST data base and gridding a given noisy image in an aerospace digital map from the Google maps network service.

Keywords: image; quad pyramidal representation; digital map; nearest neighbor; approximate nearest neighbor; search error; empirical distribution; computational complexity

IMPROVING CLASSIFICATION QUALITY FOR THE TASK OF FINDING INTRINSIC PLAGIARISM
  • I. O. Molybog Center for Energy Systems, Skolkovo Institute of Science and Technology, Skolkovo Innovation Center, 3 Nobel Str., Moscow 143026, Russian Federation; Moscow Institute of Physics and Technology, 9 Institutskiy Per., Dolgoprudny, Moscow Region 141700, Russian Federation
  • A. P. Motrenko Moscow Institute of Physics and Technology, 9 Institutskiy Per., Dolgoprudny, Moscow Region 141700, Russian Federation
  • V. V. Strijov A. A. Dorodnicyn Computing Center, Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 40 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The paper addresses the classification problem in multidimensional spaces. The authors propose a supervised modification of the t-distributed Stochastic Neighbor Embedding Algorithm. Additional features of the proposed modification are that, unlike the original algorithm, it does not require retraining if new data are added to the training set and can be easily parallelized. The novel method was applied to detect intrinsic plagiarism in a collection of documents. The authors also tested the performance of their algorithm using synthetic data and showed that the quality of classification is higher with the algorithm than without or with other algorithms for dimension reduction.

Keywords: data analysis; dimension reduction; nonlinear dimension reduction; manifold learning; intrinsic plagiarism detection

METHODS FOR INTRINSIC PLAGIARISM DETECTION
  • K. F. Safin Moscow Institute of Physics and Technology, 9 Institutskiy Per., Dolgoprudny, Moscow Region 141700, Russian Federation; Antiplagiat JSC, 33 Varshavskoe Shosse, Moscow 117105, Russian Federation
  • M. P. Kuznetsov "Forecsys" LLC, 42 Vavilov Str., Moscow 119333, Russian Federation
  • M. V. Kuznetsova Moscow Institute of Physics and Technology, 9 Institutskiy Per., Dolgoprudny, Moscow Region 141700, Russian Federation; Antiplagiat JSC, 33 Varshavskoe Shosse, Moscow 117105, Russian Federation

Abstract: There are two ways to find plagiarism in documents: “external” and “intrinsic” plagiarism detection. External plagiarism detection is the task with a known set of possible references. Intrinsic plagiarism detection aims at discovering plagiarism by analyzing only the document by itself. The paper investigates the methods of intrinsic plagiarism detection. The authors developed a plagiarism detection method based on constructing statistics from the features of the document parts and detecting outliers. The proposed algorithm was tested on the PAN-2011 collection for intrinsic plagiarism detection.

Keywords: natural language processing; intrinsic plagiarism detection; outliers detection

PSYCHOLINGUISTIC ANALYSIS OF TEXT MESSAGES IN RUSSIAN BASED ON THEIR PHONOSEMANTIC STATISTICAL CHARACTERISTICS
  • A. S. Sigov Moscow Technological University, 78 Vernadsky Av., Moscow 119454, Russian Federation
  • D. A. Akimov Moscow Technological University, 78 Vernadsky Av., Moscow 119454, Russian Federation
  • D. O. Zhukov Moscow Technological University, 78 Vernadsky Av., Moscow 119454, Russian Federation
  • E. G. Andrianova Moscow Technological University, 78 Vernadsky Av., Moscow 119454, Russian Federation
  • V. E. Sachkov Moscow Technological University, 78 Vernadsky Av., Moscow 119454, Russian Federation
  • V. K. Raev Moscow Technological University, 78 Vernadsky Av., Moscow 119454, Russian Federation

Abstract: A text as a complex semantic and syntactic formation has a number of psycholinguistic characteristics, which include integrity and semantic orientation. A text can be viewed as a product of speech activity with a high degree of semantic variation determined by its temporal and sonar characteristics. Nonverbal behavior of network entities — virtual masks and robotic agents — reveals itself in texts. The article raises and solves the problem of identifying the type of accentuation of pattern of behavior of a virtual entity based on statistical analysis of text communication, which allows one to formulate a hypothesis about the structural properties of a given communication and build a matrix of probabilities of relationship between virtual masks of subjects. The practical significance of the proposed solution is based on the growing importance of the development of the system of conditional signs, in this case, the conditional languages of e-communication, for the generation of control clusters regulating the social behavior of virtual subjects in the network. This assumption is based on the hypothesis of Kenneth Ivers, according to which, the better the system of conventional signs, the more opportunities to create new algorithms.

Keywords: psycholinguistic characteristics; nonverbal behavior; virtual masks; process of thinking; semantic meaning; linguistic relativism

PROBABILITY MODEL FOR ANALYZING LICENSED SHARED ACCESS WITH ADAPTIVE POWER CONTROL IN A WIRELESS NETWORK
  • I. A. Gudkova Peoples’ Friendship University of Russia, 6 Miklukho-Maklaya Str., Moscow 117198, Russian Federation; Institute of Informatics Problems, Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • S. Ya. Shorgin Institute of Informatics Problems, Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: Emerging next generation wireless networks involve new applications and services for human-to-human and machine-to-machine (M2M) devices. The problem of increasing requirements for network capacity and lack of radio spectrum arises. The solution could be found in the licensed shared access framework, e. g., in the case of smart cities. The authors propose a mathematical model of shared access to spectrum with adaptive power control. The algorithm makes it possible to avoid the interference between M2M devices and the spectrum owner due, in part, to the fact that it takes into account the spatial distribution and session activity of devices.

Keywords: wireless network; smart city; machine-to-machine (M2M); licensed shared access (LSA); adaptive power control; stochastic process; recursive algorithm; blocking probability; interruption probability; average number of M2M devices

QUEUING SYSTEMS WITH RESOURCES AND SIGNALS AND THEIR APPLICATION FOR PERFORMANCE EVALUATION OF WIRELESS NETWORKS
  • K. E. Samouylov Peoples’ Friendship University of Russia, 6 Miklukho-Maklaya Str., Moscow 117198, Russian Federation; Institute of Informatics Problems, Federal Research Center “Computer Sciences and Control” of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • E. S. Sopin Peoples’ Friendship University of Russia, 6 Miklukho-Maklaya Str., Moscow 117198, Russian Federation; Institute of Informatics Problems, Federal Research Center “Computer Sciences and Control” of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • S. Ya. Shorgin Institute of Informatics Problems, Federal Research Center “Computer Sciences and Control” of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The paper considers a queuing system with limited resources, random requirements, and signals. Each customer occupies a server and a random amount of resources for the whole service duration. Besides, a Poisson flow of signals arrives to the queue. Signal arrival triggers the resource reallocation process. The model can describe functioning of a wireless network taking into account user movement during a session. Two cases are considered: independent movement of users, when resources are reallocated independently for each session, and joint movement, when all resources are reallocated at once.

Keywords: queuing system; random requirement; signals; limited resources; wireless network; LTE-advanced

REVISITING JOINT STATIONARY DISTRIBUTION IN TWO FINITE CAPACITY QUEUES OPERATING IN PARALLEL
  • L. Meykhanadzhyan School No. 281 of Moscow, 7 Raduzhnaya Str. Moscow 129344, Russian Federation
  • S. Matyushenko Peoples’ Friendship University of Russia (RUDN University), 6 Miklukho-Maklaya Str., Moscow 117198, Russian Federation
  • D. Pyatkina Peoples’ Friendship University of Russia (RUDN University), 6 Miklukho-Maklaya Str., Moscow 117198, Russian Federation
  • R. Razumchik Peoples’ Friendship University of Russia (RUDN University), 6 Miklukho-Maklaya Str., Moscow 117198, Russian Federation; Institute of Informatics Problems, Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences; 44-2 Vavilova Str., Moscow 119333, Russian Federation

Abstract: The paper revisits the problem of the computation of the joint stationary probability distribution pij in a queueing system consisting of two single-server queues, each of capacity N ? 3, operating in parallel, and a single Poisson flow. Upon each arrival instant, one customer is put simultaneously into each system. When a customer sees a full system, it is lost. The service times are exponentially distributed with different parameters. Using the approach based on generating functions, the authors obtain a new system of equations of a smaller size than the size of the original system of equilibrium equations (3N — 2 compared to (N + 1)2). Given the solution of the new system, the whole joint stationary distribution can be computed recursively. The new system gives some insights into the interdependence of pij and pnm. If relations between pi-1,N and pi,N for i = 3, 5, 7,... are known, then the blocking probability can be computed recursively. Using the known results for the asymptotic behavior of pij as i, j > ?, the authors illustrate this idea by a simple numerical example.

Keywords: two queues; generating function; stationary distribution; paired

ON PARALLELIZATION OF ASYMPTOTICALLY OPTIMAL DUALIZATION ALGORITHMS
  • E. V. Djukova Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation; M.V. Lomonosov Moscow State University, M.V. Lomonosov Moscow State University, 1-52 Leninskiye Gory, GSP-1, Moscow 119991, Russian Federation
  • A. G. Nikiforov Technische University of Munich, 21 Arcisstrasse, Munich 80333, Germany
  • P. A. Prokofyev Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The main goal of the paper is to develop and implement an approach to building efficient parallel algorithms for intractable enumeration problems and to apply this approach to one of the central enumeration problems, i. e., dualization. Asymptotically optimal algorithms for dualization are considered to be the fastest among the known ones. They have a theoretical justification of the efficiency on average. The size of enumerated set in the dualization problem grows exponentially with the size of the input; thus, parallel computations are reasonable to be utilized. The authors introduce the static parallelizing scheme for asymptotically optimal algorithms of dualization and present the results of the testing. Statistical processing of the experimental results is conducted in order to determine the kind of distribution of the random variables, representing the size of the subtasks for parallel computation. The conditions, under which the schema demonstrates almost maximum speedup and quite uniform processors load, are discovered.

Keywords: discrete enumeration problem; dualization; asymptotically optimal algorithm; irreducible covering of a Boolean matrix; polynomial-time delay algorithm; parallel dualization algorithm

STATISTICAL DATA AS INFORMATION SOURCE FOR LINGUISTIC ANALYSIS OF RUSSIAN CONNECTORS
  • O. Inkova Institute of Informatics Problems, Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • N. Popkova Institute of Informatics Problems, Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The aim of this paper is to describe statistical data gathered from the supracorpora database (SCDB) of connectors for further analysis of their formal and functional properties. Until now, these properties have usually been described applying semantic analysis, while corpus data, if used at all, have not been subject to statistical processing. It is automatically generated and verifiable information, collected from texts corpora that can be one of the most reliable tools in the analysis of linguistic units, including connectors. The paper shows what statistics one may obtain from the SCDB and how to use it in the linguistic analysis in case of tol’ko, a polyfunctional linguistic unit that can be a part of multicomponent and two-place connectors.

Keywords: annotation of connectors; corpus linguistics; supracorpora databases; parallel texts; statistical

INDICATOR EVALUATION OF PROCESSES OF KNOWLEDGE TRANSFER FROM SCIENCE TO TECHNOLOGY
  • I. M. Zatsman Institute of Informatics Problems, Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • G. V. Lukyanov Institute of Informatics Problems, Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • V. A. Minin Institute of Informatics Problems, Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • V. A. Havanskov Institute of Informatics Problems, Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • S. K. Shubnikov Institute of Informatics Problems, Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The article is dedicated to indicator evaluation of information interactions between science and technology. Some of these indicators are defined as single numeric values and some as matrices of numeric values that characterize the intensity of the knowledge flow from different research areas into specific technological branches. The article provides a description of primary information resources, mainly full-text descriptions of patents, which are used to define numerical values of these indicators. It also gives a description of secondary information resources generated as the result of patent documentation processing, including information on references to scientific publications cited in patents. Primary and secondary resources were used to create and test the information model and the corresponding indicators of assessment of interaction between science and technology. This model was applied as a foundation for calculation of numerical values of integral and thematic indicators of the intensity of scientific knowledge flow into the branch of information technologies.

Keywords: information interaction between science and technology; citation of scientific works; intensity of the knowledge flow; indicator assessment; information technology

 

RUS