Институт проблем информатики Российской Академии наук
Институт проблем информатики Российской Академии наук
Российская Академия наук

Институт проблем информатики Российской Академии наук




«INFORMATICS AND APPLICATIONS»
Scientific journal
Volume 12, Issue 3, 2018

Content | About  Authors

Abstract and Keywords.

FINDING CONTROL POLICY FOR ONE DISCRETE-TIME MARKOV CHAIN ON [0,1] WITH A GIVEN INVARIANT MEASURE
  • M. G. Konovalov  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • R. V. Razumchik  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation, Peoples' Friendship University of Russia (RUDN University), 6 Miklukho-Maklaya Str., Moscow 117198, Russian Federation

Abstract: A discrete-time Markov chain on the interval [0,1] with two possible transitions (left or right) at each step has been considerred. The probability of transition towards 0 (and towards 1) is a function of the current value of the chain. Having chosen the direction, the chain moves to the randomly chosen point from the appropriate interval.
The authors assume that the transition probabilities depend on the current value of the chain only through a finite number of real-valued numbers. Under this assumption, they seek the transition probabilities, which guarantee the L2 distance between the stationary density of the Markov chain and the given invariant measure on [0,1] is minimal. Since there is no reward function in this problem, it does not fit in the MDP (Markov decision process) framework. The authors follow the sensitivity-based approach and propose the gradient- and simulation-based method for estimating the parameters of the transition probabilities. Numerical results are presented which show the performance of the method for various transition probabilities and invariant measures on [0,1].

Keywords: Markov chain; control; continuous state space; sensitivity-based approach; derivative estimation

MEAN-SQUARE THRESHOLDING RISK WITH A RANDOM SAMPLE SIZE
  • O.V. Shestakov   Department of Mathematical Statistics, Faculty of Computational Mathematics and Cybernetics, M.V. Lomonosov Moscow State University, 1-52 Leninskiye Gory, GSP-1, Moscow 119991, Russian Federation, Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: Nonlinear methods of signal de-noising based on the threshold processing of wavelet coefficients are widely used in various application areas. These methods have gained their popularity due to the speed of the algorithms for constructing estimates and the possibility of adapting to functions belonging to different classes of regularity better than linear methods. When applying thresholding techniques, it is usually assumed that the number of wavelet coefficients is fixed and the noise distribution is Gaussian. This model has been well studied in the literature, and optimal threshold values have been calculated for different classes of signals. However, in some situations, the sample size is not known in advance and is modeled by a random variable. The present author considers a model with a random number of observations containing Gaussian noise and estimates the order of the mean-square risk with increasing sample size.

Keywords: thresholding; random sample size; mean-square risk

BAYESIAN BALANCE MODELS
  • A. A. Kudryavtsev  Department of Mathematical Statistics, Faculty of Computational Mathematics and Cybernetics, M.V. Lomonosov Moscow State University, 1-52 Leninskiye Gory, GSP-1, Moscow 119991, Russian Federation

Abstract: A number of previous author's works were devoted to the Bayesian approach to queuing and reliability In this paper, the method application is extended to a wide range of problems, such as demography, physics, political science, modeling of emergencies, medicine, etc. The method is based on separation of system factors into two groups: those that support functioning of the system (positive, or p-factors) and those that inhibit system's functioning (negative, or n-factors). In the paper, system's balance index, which equals to the ratio of n- and p-factors, and the advantage index, which equals to the ratio of p-factor to the sum of n- and p-factors, are considered. It is assumed that the factors, which affect the system, change over time, and besides their exact values are impossible to determine due to the measuring equipment's imperfections, excessively high expenses on thorough research, lack of time and resources, and so on. Such prerequisites lead to usage of the Bayesian method in application to the problems described. The method implies randomization of the initial parameters (factors) and, as a consequence, randomization of the balance and advantage indices. The main goal of the research is to study probabilistic characteristics of the balance and advantage indices assuming that the apriori distributions of the system's factors are known. In the case of independently distributed n- and p-factors, which are random variables, the problem is reduced to studying properties of the distributions' mixtures. As opposed to popular normal mixtures, in Bayesian balance models, the distribution being mixed has a positive support. Special attention is paid to apriori gamma-type distributions, since these distributions are adequate asymptotic approximations of a wide range of probability distributions. The mixtures of exponential, Erlang, and Weibull apriori distributions were considered earlier. In this paper, special attention is paid to the case of Nakagami m-distribution of n- and p-factors (with its particular cases of Rayleigh, Maxwell-Boltzmann, chi-, and other distributions). The explicit formulas for density, distribution functions, and moments of the balance index for different combinations of distributions are obtained.
The results provided in this paper can be applied to many different tasks conserning indices, ratings, and indicators.

Keywords: Bayesian method; mixed distributions; balance index; advantage index; balance process; Nakagami m-distribution

DATA NOISING BY FINITE NORMAL AND GAMMA MIXTURES WITH APPLICATION TO THE PROBLEM OF ROUNDED OBSERVATIONS
  • A. K. Gorshenin   Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: In many real problems, statistical analysis of data containing additional measurement errors, including rounding, is performed, which in some situations can lead to sufficiently significant distortions. In this paper, estimates for an unknown expectation of observations are obtained for one of the possible rounding models under the assumption that the original data are additionally noised with random variables having distributions of the type of finite mixtures of normal and gamma laws. Confidence intervals for an unknown expectation are constructed using the refined estimate for the variance of the integer part of the random variable. An algorithm for determining the value of the parameter of artificial noise, which can be added to the initial data to improve the quality of the method of moving separation of mixtures, is discussed.

Keywords: noisy data; rounded data; finite normal mixtures; finite gamma mixtures; confidence intervals; moving separation of mixtures

ANALYSIS OF CUTTING DAMAGES TO MULTIPOLAR NETWORKS
  • Yu. E. Malashenko   Dorodnicyn Computing Center, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 40 Vavilov Str., Moscow 119333, Russian Federation
  • I. A. Nazarova  Dorodnicyn Computing Center, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 40 Vavilov Str., Moscow 119333, Russian Federation
  • N. M. Novikova   Dorodnicyn Computing Center, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 40 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The method of estimating changes in the functional capabilities of a multipolar flow network system after a damage is proposed. For each sink arc, the maximal flow is calculated, independent of the flow value across the remaining sink arcs. The authors consider cutting structural damages that correspond to removing all arcs forming a minimal cut. The capacity of the cut is equal to the maximal flow along some sink arc. Among the structural damages, the critically dangerous ones are selected with an introduced criterion. For each arc belonging to at least one cutting structural damage, a quantitative characteristic is computed to estimate consequences of its destruction. The described approach is proposed to be used in studying vulnerability of territorially distributed multiuser systems with the network structure in the case of a single-product transfer.

Keywords: single-product flow network; structural vulnerability of network; multipolar flow model

ON THE INSENSITIVITY OF THE STATIONARY DISTRIBUTION OF THE LIMITED RESOURCES QUEUING SYSTEM WITH STATE-DEPENDENT ARRIVAL AND SERVICE RATES
  • E. S. Sopin   Peoples' Friendship University of Russia (RUDN University), 6 Miklukho-Maklaya Str., Moscow 117198, Russian Federation, Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • V. A. Naumov   Service Innovation Research Institute (PIKE), 8A Annankatu, Helsinki 00120, Finland
  • K. Е. Samouylov   Peoples' Friendship University of Russia (RUDN University), 6 Miklukho-Maklaya Str., Moscow 117198, Russian Federation, Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The authors consider further generalization ofthe queuing systems, in which customers require not only a server but also a certain amount of limited resources. In the considered queuing system, arrival and serving intensities depend on the statе of the system. The authors assume an arbitrary distribution of the service time.
The authors prove that the stationary distribution of the system has product form in the case of Poisson arrivals.
Moreover, it was shown that the steady-state probability distribution of number of customers in the system and volumes of occupied resources depends on the service time distribution only through its mean.

Keywords: queueing system; limited resources; insensitivity; service time

RESOURCE QUEUING SYSTEMS AS MODELS OF WIRELESS COMMUNICATION SYSTEMS
  • A. V. Gorbunova   Peoples' Friendship University of Russia (RUDN University), 6 Miklukho-Maklaya Str., Moscow 117198, Russian Federation
  • V. A. Naumov   Service Innovation Research Institute (PIKE), 8A Annankatu, Helsinki 00120, Finland
  • Yu. V. Gaidamaka   Peoples' Friendship University of Russia (RUDN University), 6 Miklukho-Maklaya Str., Moscow 117198, Russian Federation, Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • K. Е. Samouylov   Peoples' Friendship University of Russia (RUDN University), 6 Miklukho-Maklaya Str., Moscow 117198, Russian Federation, Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The article presents an overview of the resource queuing systems used for modeling of a wide class of real systems with admittedly limited resources. Despite the objective importance of studying of such systems, there have been very few works devoted to their analysis until recently, which was due to the complexity of constructing a random process to describe their functioning and, accordingly, of obtaining the numerical results. However, in recent years, there has been a significant shift in the study of the resource systems - new methods for their analysis have been proposed, which made it possible to construct recursive algorithms suitable for the numerical calculations.
In this regard, the current review reflects only a part of the previously obtained results, namely, it considers the resource systems without waiting space with exponentially distributed service time. The authors consider the models of wireless communication systems based on resource queuing systems, expressions for estimating the main probabilistic, and temporal characteristics and algorithms for their calculation.

Keywords: resource queueing systems; continuous resource; discrete resource; limited resource; recursive algorithm; heterogeneous network; stationary distribution; semi-Markov process; wireless communication systems

SUPERVISED LEARNING CLASSIFICATION OF DATA TAKING INTO ACCOUNT PRINCIPAL COMPONENT ANALYSIS
  • M. P. Krivenko  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The article examines questions of supervised learning classification of data taking into account principal component analysis (PCA) results. Oonstruction of a Bayesian classifier becomes possible after representation of covariances through the parameters of the probabilistic PCA model. The case of singular data distributions is singled out; for this case, it is suggested to estimate the parameters of the model under constraints on the eigenvalues of covariance matrices. The quality of classification is studied in respect to the actual data dimension.
It is demonstrated that, when correctly assigned, the classifier has the least error probabilities. Exceeding the best value of the dimension usually worsens the quality of the classification to a lesser extent than its underestimation.
The mixture of probabilistic principal component analyzer allows modeling big data by means of a relatively small number of free parameters. The number of free parameters can be controlled by choosing the latent dimension of the data.

Keywords: principal component analysis; mixtures of normal distributions; EM algorithm; supervised learning classification

PARAMETRIZATION IN APPLIED PROBLEMS OF SEARCH OF EMPIRICAL REASONS
  • A. A. Grusho  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • N. A. Grusho  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • M. I. Zabezhailo  A. A. Dorodnicyn Computing Center, Federal Research Center "Computer Sciences and Control" of the Russian Academy of Sciences, 40 Vavilov Str., Moscow 119133, Russian Federation
  • D. V. Smirnov  Sberbank of Russia, 19 Vavilov Str., Moscow 117999, Russian Federation
  • E. E. Timonina  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The authors define description of a finite class of objects in the form of a set of characteristics (parameters) of these objects as parametrization of the considered class. Besides identification of objects, the problem of the causality that some objects have property P exists. For the solution of this task, in the conditions of emergence of new objects, an initial set of characteristics cannot be enough. In this case, it is necessary to change parametrization.
The paper is devoted to creation of methods of changes of the initial parametrization in the problem of specification of the empirical causality of emergence of the property P at expansion of basic data. The constructed methods are shown on practical examples.

Keywords: JSM-methods of artificial intelligence; parametrization of classes of objects; empirical reason; authentication

METHODS AND TOOLS FOR FAULT DETECTION ON ELEMENTS OF HOUSING AND UTILITY INFRASTRUCTURE
  • I. A. Shanin  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • S. A. Stupnikov  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • V. N. Zakharov  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The work belongs to the area of development of specific information systems based on the Internet of Things technology. An approach for program implementation of a module intended for detection of faults on elements of housing and utility infrastructure is proposed. The module is considered as a part of an information system aimed at technical maintenance of mentioned elements: condition monitoring, predictive maintenance, fault detection, and reporting. Operation algorithms of module components are described: building of operation models for housing and utility infrastructure elements and fault detection. The approach is applied on a couple of datasets for fault detection, experimental results are presented.

Keywords: Internet of Things; data analysis; fault detection; housing and utility infrastructure

IMPLIED KNOWLEDGE: FOUNDATIONS AND TECHNOLOGIES OF EXPLICATION
  • I. M. Zatsman  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The theoretical foundations of the development of information technologies that provide the goal- oriented creation of new linguistic typologies are described. They are formed in the process of contrastive analysis of parallel aligned texts, which are sources of new knowledge of the language. In parallel texts, there are the implications of subjective knowledge of translators that are not represented in the system of modern knowledge of language. Their explication is possible with the help of information technologies, which allow processing parallel texts and goal-oriented extraction of the implied knowledge. The aim of the paper is to describe a new approach to the development of technologies that provide purposefulness and compare it with existing approaches and models.
The conditions under which the growth of objective knowledge (in terms of K. R. Popper) can be technologically ensured are formulated. The proposed approach is illustrated by the example of the task of forming a typology of Russian language constructions with a modal value that arise in the translation of German modal constructions.

Keywords: parallel texts; corpus linguistics; implied knowledge; extraction of new knowledge; emergence; information technology; purposefulness; formation of typologies

STATISTICAL ANALYSIS OF LANGUAGE SPECIFICITY OF CONNECTIVES BASED ON PARALLEL TEXTS
  • O. Yu. Inkova  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • O. Yu. Inkova  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: In recent decades, problems of language specificity in the Russian language attract considerable attention of researchers, although until recently, they have not been thoroughly examined using corpus-based methods. This paper presents a new method of investigating language specificity of Russian connectives based on statistical analysis of annotated parallel texts. Russian-French and French-Russian parallel texts are processed with the help of the Supracorpora Database (SCDB) of Connectives designed specifically for annotation of translation correspondences (TCs) found in parallel texts. Each TC includes annotations of a Russian connective and its translation equivalent (TE), which enables one to obtain statistical data on various translation models (TMs) based on several proposed parameters of language specificity of connectives. As an example, in this work, language specificity of two Russian connectives will be examined: или and а то. Based on the proposed statistical parameters, it will be demonstrated that или has a very low degree of language specificity in the context of the Russian-French language pair, while а то is a highly language-specific connective. The results of this research are applicable to informatics (machine translation and statistical analysis of textual data) and comparative study of languages, such as lexical typology, lexicography, and theory and practice of translation.

Keywords: supracorpora databases; statistical analysis; contrastive corpus analysis; language specificity; parallel corpora; linguistic information resources; connectives; discourse relations; semantics

SEMANTIC PROCESSING OF UNSTRUCTURED TEXTUAL DATA BASED ON THE LINGUISTIC PROCESSOR PullEnti
  • E. B. Kozerenko  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • K. I. Kuznetsov  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • and D. A. Romanov  National Research University "Higher School of Economics," 20 Myasnitskaya Str., Moscow 101000, Russian Federation

Abstract: The paper presents the method for creation of knowledge extraction systems based on the approach employing the software tool system PullEnti comprising the algorithms for morphological and semantic-syntactical analysis which makes it possible to extract entities of certain types from natural language texts (persons, organizations, locations, and other target semantic objects). The PullEnti system uses dynamically connected components (plugins) which makes it possible to activate various functions without recompiling. This is how the semantic analysis unit is incorporated. During the analysis, the semantic units (tokens) are established, which are typed phrases: text, numerical data, etc. Examples of implemented projects for different subject areas are given.

Keywords: semantic modeling; named entities recognition, data intensive domains; automated systems of knowledge extraction; semantic search; intelligent Internet technologies

STOCHASTIC DIFFERENTIAL SYSTEM OUTPUT CONTROL BY THE QUADRATIC CRITERION. I. DYNAMIC PROGRAMMING OPTIMAL SOLUTION
  • A. V. Bosov  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • A. I. Stefanovich  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The problem of optimal control for the Ito diffusion process and a controlled linear output is solved. The considered statement is close to the classical linear-quadratic Gaussian control (LQG control) problem. Differences consist in the fact that the state is described by the nonlinear differential Ito equation dyy = At(yt) dt + >t(yt) dvt and does not depend on the control ut, optimization subject is controlled linear output dzt = atyt dt + btzt dt + ct ut dt + at dwt. Additional generalizations are included in the quadratic quality criterion for the purpose of statement such problems as state tracking by output or a linear combination of state and output tracking by control. The method of dynamic programming is used for the solution. The assumption about Bellman function in the form Vt(y, z) = atz2 + @t(y)z + Yt(y) allows one to find it. Three differential equations for the coefficients at, @t(y), and Yt(y) give the solution. These equations constitute the optimal solution of the problem under consideration.

Keywords: stochastic differential equation; optimal control; dynamic programming; Bellman function; Riccati equation; linear differential equations of parabolic type

MODEL OF TRANSPORTATION OF TRAINS AND SHUNTING LOCOMOTIVES AT A RAILWAY STATION FOR EVALUATION AND ANALYSIS OF SIDE-COLLISION PROBABILITY
  • A. V. Bosov  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
  • A. N. Ignatov  Moscow Aviation Institute (National Research University), 4 Volokolamskoe Shosse, Moscow 125993, Russian Federation
  • A. V. Naumov  Moscow Aviation Institute (National Research University), 4 Volokolamskoe Shosse, Moscow 125993, Russian Federation

Abstract: A mathematical model for solution of the shunting locomotives traffic control problem is proposed for a fixed schedule of passenger/freight trains traffic across a station and a fixed time-table of shunting operations: set off and attaching of cars, output operation and deconsolidation of trains. The model is used for formulation and solution of the problem to minimize time of shunting locomotive transportation across the station to perform next scheduled operation with respect to busy condition of some tracks for transportation owing to presence of passenger/freight trains on them and with respect to restriction on shunting operation execution time. The original statement is reduced to mixed integer linear programming. The presented model was used for evaluation of side-collision probability at a railway station with respect to possible random drag in passenger trains traffic. The results of numerical experiments are presented.

Keywords: simulation model; schedule; intensity; mixed integer linear programming

FILTERING OF MARKOV JUMP PROCESSES BY DISCRETIZED OBSERVATIONS
  • A. V. Borisov  Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The article is devoted to a solution of the optimal filtering problem of a homogenous Markov jump process state. The available observations represent time increments of the integral transformations of the Markov state corrupted by Wiener processes. The noise intensity is also state-dependent. At the instant of the consecutive observation obtaining, the optimal estimate is calculated recursively as a function of previous estimate and the new observation, meanwhile between observations the filtering estimate is a simple forecast by virtue of the Kolmogorov differential system. The recursion is rather expensive because of need to calculate the integrals, which are the location-scale mixtures of Gaussians. The mixing distributions represent the occupation of the state in each of possible values during the mid-observation intervals. The paper contains numerically cheaper approximations, based on the restriction of the state transitions number between the observations. Both the local and global characteristics of approximation accuracy are obtained as functions of the dynamics parameters, mid-observation interval length, and upper bound of transitions number.

Keywords: Markov jump process; optimal filtering; multiplicative observation noises; stochastic differential equation; numerical approximation