|
|||
Informatics and Applications scientific journalVolume 14, Issue 1, 2020Content Abstract and Keywords About Authors ASYMPTOTIC REGULARITY OF THE WAVELET METHODS OF INVERTING LINEAR HOMOGENEOUS OPERATORS FROM OBSERVATIONS RECORDED AT RANDOM TIMES
Abstract: When solving inverse statistical problems, it is often necessary to invert some linear homogeneous operator and it is usually necessary to use regularization methods, since the observed data are noisy. Popular methods for noise suppression are the procedures of thresholding the expansion coefficients of the observed function. The advantages of these methods are their computational efficiency and the ability to adapt to both the type of operator and the local features of the estimated function. An analysis of the errors of these methods is an important practical task, since it allows one to evaluate the quality of both the methods themselves and the equipment used. Sometimes, the nature of the data is such that observations are recorded at random times. If the observation points form a variational series constructed from a sample of a uniform distribution on the data recording interval, then the use of conventional threshold processing procedures is adequate. The present author analyzes the estimate of the mean square risk in the problem of inversion of linear homogeneous operators and demonstrates that under certain conditions, this estimate is strongly consistent and asymptotically normal. Keywords: threshold processing; linear homogeneous operator; random observation points; mean square risk estimate ANALYSIS OF CONFIGURATIONS OF LSTM NETWORKS FOR MEDIUM-TERM VECTOR FORECASTING
Abstract: The paper analyzes 36 configurations of LSTM (long short-term memory) architectures for forecasting with a duration up to 70 steps based on data whose size is 300-500 elements. For probabilistic approximation of observations, a model based on finite normal mixtures is used; therefore, the mathematical expectation, variance, skewness, and kurtosis of these mixtures are used as initial data for forecasting. The optimal configurations of neural networks were determined and the practical possibility of constructing high-quality medium-term forecasts with a limited training time was demonstrated. The results obtained are important for the development of a probabilistic-statistical approach to the description of the evolution of turbulent processes in a magnetically active high-temperature plasma. Keywords: LSTM; forecasting; deep learning; high-performance computing; CUDA NUMERICAL SCHEMES OF MARKOV JUMP PROCESS FILTERING GIVEN DISCRETIZED OBSERVATIONS II: ADDITIVE NOISE CASE
Abstract: The note is a sequel of investigations initialized in the article Borisov, A. 2019. Numerical schemes of Markov jump process filtering given discretized observations I: Accuracy characteristics. Inform. Appl. 13(4):68-75.
Keywords: Markov jump process; optimal filtering; additive and multiplicative observation noises; stochastic differential equation; analytical and numerical approximation STOCHASTIC DIFFERENTIAL SYSTEM OUTPUT CONTROL BY THE QUADRATIC CRITERION. IV. ALTERNATIVE NUMERICAL DECISION
Abstract: In the study of the optimal control problem for the Ito diffusion process and the controlled linear output with a quadratic quality criterion, an intermediate result is resumed: for approximate calculation of the optimal solution, an alternative to classical numerical integration method based on computer simulation is proposed.
Keywords: stochastic differential equation; optimal control; Bellman function; linear differential equations of parabolic type; Kolmogorov equation; Feynman-Katz formula; computer simulations; Monte-Carlo method ALIGNMENT OF ORDERED SET CARTESIAN PRODUCT
Abstract: The work is devoted to the study of metric methods for analyzing objects with complex structure. It proposes to generalize the dynamic time warping method of two time series for the case of objects defined on two or more time axes. Such objects are matrices in the discrete representation. The DTW (Dynamic Time Warping) method of time series is generalized as a method of matrices dynamic alignment. The paper proposes a distance function resistant to monotonic nonlinear deformations of the Cartesian product of two time scales. The alignment path between objects is defined. An object is called a matrix in which the rows and columns correspond to the axes of time. The properties of the proposed distance function are investigated. To illustrate the method, the problems of metric classification of objects are solved on model data and data from the MNIST dataset. Keywords: distance function; dynamic alignment; distance between matrices; nonlinear time warping; space-time series NEUROPHYSIOLOGY AS A SUBJECT DOMAIN FOR DATA INTENSIVE PROBLEM SOLVING
Abstract: The goal of this survey is to analyze neurophysiology as a data intensive domain. Nowadays, the number of researches on the human brain is increasing. International projects and researches are aimed at improvement of the understanding of the human brain function. The amount of data obtained in typical laboratories in the field of neurophysiology is growing exponentially. The data are represented using a large number of various formats.
Keywords: neurophysiology; neurophysiological resources; neuroinformatics; data intensive research; analysis of neurophysiological data RISK-NEUTRAL DYNAMICS FOR THE ARIMA-GARCH RANDOM PROCESS WITH ERRORS DISTRIBUTED ACCORDING TO THE JOHNSON'S SU LAW
Abstract: Risk-neutral world is one of the fundamental principles of financial mathematics, for definition of a fair value of derivative financial instruments. The article deals with the construction of risk-neutral dynamics for the ARIMA-GARCH (Autoregressive Integrated Moving Average, Generalized AutoRegressive Conditional Heteroskedasticity) random process with errors distributed according to the Johnson's SU law. Methods for finding risk-neutral coefficients require the existence of a generating function of moments (examples of such transformations are the Escher transformation, the extended Girsanov principle). A generating function of moments is not known for Student and Johnson's SU distributions. The authors form a generating function of moments for the Johnson's SU distribution and prove that a modification of the extended Girsanov principle may obtain a risk-neutral measure with respect to the chosen distribution. Keywords: ARIMA; GARCH; risk-neutral measure; Girsanov extended principle; Johnson's SU ; option pricing IMPROVEMENT OF THE ACCURACY OF SOLUTION OF TASKS FOR THE ACCOUNT OF THE CONSTRUCTION OF BOUNDARY CONDITIONS
Abstract: The problems of stability of the solution of inverse problems with respect to the exact setting of boundary conditions are considered. In practical applications, as a rule, the theoretical form of the functional dependence of the boundary conditions is a form that is not defined or not known, and there are also random measurement errors. Studies have shown that this leads to a significant reduction in the accuracy of solving the inverse problem. In order to increase the accuracy of solving inverse problems, it was proposed to refine the functional form of the boundary conditions by recognizing the form of the mathematical model of dependence with the subsequent approximation by this function of the behavior of a physical quantity at the boundary. Dependency recovery was performed using dependency recognition methods based on structural difference schemes and inverse mapping recognition. Model examples of implementation in the presence of additive random measurement errors and an unknown type of dependence of the boundary conditions are given. Keywords: inverse problem; recognition; functional dependence; model; difference schemes; inverse function; sampling; variance; approximation ON METHODS FOR IMPROVING THE ACCURACY OF MULTICLASS CLASSIFICATION ON IMBALANCED DATA
Abstract: This paper studies methods to overcome the imbalance of classes in order to improve the quality of classification with accuracy higher than the direct use of classification algorithms to unbalanced data. The scheme to improve the accuracy of classification is proposed, consisting in the use of a combination of classification algorithms and methods ofselection offeatures such as RFE (Recursive Feature Elimination), Random Forest, and Boruta with the preliminary use of balancing classes by random sampling methods, SMOTE (Synthetic Minority Oversamplimg TEchnique) and ADASYN (ADAptive SYNthetic sampling). By the example of data on skin diseases, computer experiments were conducted which showed that the use of sampling algorithms to eliminate the imbalance of classes as well as the selection of the most informative features significantly increases the accuracy of the classification results. The most effective classification accuracy was the Random Forest algorithm for sampling data using the ADASYN algorithm. Keywords: imbalanced data; classification; sampling; random forest; ADASYN; SMOTE MODELING OF MONITORING OF INFORMATION SECURITY PROCESS ON THE BASIS OF QUEUING SYSTEMS
Abstract: The paper is devoted to the mathematical modeling of monitoring process by the information security systems, aimed at detection of hidden malicious attacks. The modeling is based on the queueing theory formalism. The monitoring process is reduced to the analysis of the customer flow arriving at the queueing system, in which each customer is regarded as carrying potential malicious attacks. Functional relations between the system state probability distribution and the distribution of the number of undetected malicious attacks on service completion epochs are obtained. These characteristics may allow one to improve the efficiency of malicious attacks detection process in the data processing systems. Keywords: protection of information; information security; queuing system; probability ON CAUSAL REPRESENTATIVENESS OF TRAINING SAMPLES OF PRECEDENTS IN DIAGNOSTIC TYPE TASKS
Abstract: The work focuses on some features of causality analysis in data mining tasks. The possibilities of using so-called open logic theories in diagnostic (classification) tasks to describe replenished sets of empirical data are discussed. In tasks of this type, it is necessary to establish (predict, diagnose, etc.) the presence or absence of a target property in a new precedent given by a description in the same presentation language of heterogeneous data, which describes examples having a target property and counter-examples not having a target property. The variant of construction of open theories describing collections of precedents by means of special logical expressions - characteristic functions - is presented. Characteristic functions allow to get rid of heterogeneity in descriptions of precedents. The procedural design of formation of characteristic functions of a training sample of precedents is proposed. The properties of characteristic functions and some conditions of their existence are studied. Keywords: diagnostics; causal analysis; intelligent data analysis; open logic PERFORMANCE OF THE BOUNDED PIPELINE
Abstract: The paper is devoted to studying the performance of a bounded pipeline that is a computational pipeline, the number of active stages of which is bounded at any time by a fixed number. The bounded pipelines with the given sum and the maximum of delays of stages are considered. The stages can have different delays. The main problem is to build an analytical model for calculating the processing time of a given amount of data using this bounded pipeline. The solution is simplified if the constraint is treated as a structural pipeline hazard. This analytical model is constructed for the case when the operation of a bounded pipeline has the property of continuity of processing for each input element. For such pipelines, the conjecture is proved in the paper that the minimum number of processors at which the greatest productivity is achieved is equal to the smallest integer not less than the ratio of the sum of stage delays to the maximum delay. It is established that if the property of continuity is not required, then this conjecture is not true. The constructed model can be used to synchronize the operation of the stages of a bounded pipeline with the continuity property. If we do not require the property of continuity, then we get an asynchronous bounded pipeline, the synchronization of the work for the stages is carried out on the basis of the data readiness. The software is developed, which is based on the theory of trace monoids and allows one to calculate the processing time with an asynchronous bounded pipeline. Keywords: computational pipeline; trace monoid; Foata normal form; pipeline performance; structural hazard METHOD FOR DEFINING FINITE NONCOMMUTATIVE ASSOCIATIVE ALGEBRAS OF ARBITRARY EVEN DIMENSION FOR DEVELOPMENT OF THE POSTQUANTUM CRYPTOSCHEMES
Abstract: The paper introduces a new unified method for defining finite noncommutative associative algebras of arbitrary even dimension m and describes the investigated properties of the algebras for the cases m = 4 and 6, when the algebras are defined over the ground field GF(p) with a large size of the prime number p. Formulas describing the set of p2 (p4) global left-sided units contained in the 4-dimensional (6-dimensional) algebra are derived. Only local invertibility takes place in the algebras investigated. Formulas for computing the unique local two-sided unit related to the fixed locally invertible vector are derived for each of the algebras. A new form of the hidden discrete logarithm problem is proposed as postquantum cryptographic primitive. The latter was used to develop the postquantum digital signature scheme. Keywords: finite noncommutative algebra; associative algebra; computationally difficult problem; discrete logarithm; digital signature; postquantum cryptography SIMULTANEOUS LOCALIZATION AND MAPPING METHOD IN THREE-DIMENSIONAL SPACE BASED ON THE COMBINED SOLUTION OF THE POINT-POINT VARIATION PROBLEM ICP FOR AN AFFINE TRANSFORMATION
Abstract: Simultaneous localization and mapping is a problem in which frame data are used as the only source of external information to define the position of a moving camera in space and at the same time, to reconstruct a map of the study area. Nowadays, this problem is considered solved for the construction of two-dimensional maps for small static scenes using range sensors such as lasers or sonar. However, for dynamic, complex, and large-scale scenes, the construction of an accurate three-dimensional map of the surrounding space is an active area of research. To solve this problem, the authors propose a solution of the point-point problem for an affine transformation and develop a fast iterative algorithm for point clouds registering in three-dimensional space. The performance and computational complexity ofthe proposed method are presented and discussed by an example of reference data. The results can be applied for navigation tasks of a mobile robot in real-time. Keywords: registration problem; localization; simultaneous localization and mapping; affine transformation; two-dimensional descriptors; iterative closest point ANALYTICAL TEXTOLOGY IN INTELLIGENT PROCESSING SYSTEMS FOR UNSTRUCTURED DATA
Abstract: The paper presents a new field of research at the intersection of linguistics, computer science, and philology involving logical and statistical methods of analyzing unstructured data in the form of natural language texts in order to solve a number of the tasks of extracting explicit and implicit knowledge from texts using a semantics-oriented linguistic processor, forming lexical statistical representations of texts, building analytical conclusions, discovery of the author's idiostyle and textual similarity of literary works based on the analysis of service words and other microtext elements; identifying the sentiment of texts, building a full profile of the author's text based on the superposition of methods. The example of the textological analysis of the "Blue Book" of the "Petersburg Diary" by Zinaida Hippius is considered. Keywords: natural language processing; statistical methods; cognitive technology; lexical semantic analysis; knowledge extraction from texts; analytical systems INCAPSULATION OF SEMANTIC REPRESENTATIONS INTO ELEMENTS OF A GRAMMAR<
Abstract: The article proposes a new mathematical apparatus of natural language representation for computer linguistics: morphology, syntax, and semantics are described as the objects of discrete mathematics forming a hierarchy and an integral information system. The proposed constructive language theory is a new approach to language learning by separating the domains of syntax and semantics, constructing the autonomous models of syntax and semantics, language formation as the mapping of elements of two sets: syntax and semantics. Keywords: natural language; graph; syntax; semantics; lexicon; word form; morphological feature; lexical group; dictionary; sentence; algorithm INFORMATION FUSION OF DOCUMENTS
Abstract: The paper considers the problems associated with the creation ofan expert base ofdocuments that require prompt processing of incoming information and, as a consequence, restructuring of the knowledge base. The authors propose procedures that reduce the search of the optimal consistent state of interrelated documents. An approach to assessing the relationship of text documents and informational messages as poorly structured objects was developed. The practical implementation of this approach is described. Keywords: information fusion; controlled data and knowledge consistency; knowledge base restructuring
|
Phone of the Center: +7 (499) 135-62-60E-mail of the Center: ipiran@ipiran.ru | RUS |