Informatics and Applications scientific journal
Volume 7, Issue 3, 2013
Content
Abstract and Keywords
About Authors
UNSUPERVISED APPROACH TO WEB WRAPPER MAINTENANCE.
- A.M. Andreev Bauman Moscow State Technical University, arkandreev@gmail.com
- D. V. Berezkin Bauman Moscow State Technical University, dmitryb2007@yandex.ru
- I.A. Kozlov Bauman Moscow State Technical University, kozlovilya89@gmail.com
- K. V. Simakov Bauman Moscow State Technical University, skv@ixlab.ru
Abstract: HTML-wrapper applications rely on formatting regularities of targeted websites. Therefore, maintenance of such
applications is connected with the problem of detecting markup changes of web pages. This article describes the
unsupervised approach to this problem. The proposed method of detection consists of two parts: the real-time one
based on clustering considering HTML-document as a vector of some features and the time-lagged one based on
comparison of distributions of such features for learning and testing sets of HTML-documents. There have been
carried out several experiments with data obtained from real wrapper. The results reveal feasibility of the suggested
approach.
Keywords: wrapper maintenance; web-site parsing; clustering; HTML-markup statistical processing
BUILDING REAL-TIME NEWS RECOMMENDATION SERVICE USING NoSQL DBMS.
- P.A. Klemenkov M.V. Lomonosov Moscow State University, parser@cs.msu.su
Abstract: The analysis of user interaction with a Web application, the methods of conducting such an analysis, and their
shortcomings are discussed. An implementation of the news recommendation service using existing approaches is
described. A newNoSQL approach to building recommendation systems that operate in near real time is suggested.
Keywords: recommendation systems; minhash; mapreduce; NoSQL
A VERIFIABLE MAPPING OF A MULTIDIMENSIONAL ARRAY DATA MODEL INTO AN OBJECT DATA MODEL.
- S.A. Stupnikov IPI RAN, ssa@ipi.ac.ru
Abstract: The paper considers a mapping of a multidimensional array data model into an object data model. General
principles of mappings of array data models into object data models are formulated. A mapping of concrete models
is also considered. The source model is the Array Data Model used in the SciDB DBMS. The target model is
the SYNTHESIS language used as the canonical data model in the subject mediation technology. A method for
verification of themapping is considered. Verification means a formal proof that themapping preserves information
and semantics of the operations. Verification is realized using the AMN formal specification language. A practical
aim of the paper is to provide a basis for virtual or materialized integration of array-based information resources.
Keywords: multidimensional arrays; object data model; data model mapping; database integration
STUDY OF THE WIKIPEDIA(EN) CATEGORIES GRAPH.
- A. V. Shkotin GIS department, State GeologicalMuseum of the Russian Academy of Sciences, ashkotin@acm.org
Abstract: Wikipedia is the outstanding project of knowledge accumulation both of general using and different areas of
specialization. Quality check of this knowledge, especially automatic, is very important. In this paper, the results of studying a structure of the English version of WCG (Wikipedia Categories Graph) as a whole are presented. The
WCG is a system that supports structure of knowledge and it is interesting to know what WCG includes and how
it is arranged. It is shown that in graph, there are unacceptable logic violations and organizational and technical
methods for elimination are discussed.
Keywords: Wikipedia; digraph; connected components; logical analysis
ACTIVE AUTHENTICATION METHODS USING KEYSTROKE DYNAMICS.
- V. Yu. Kaganov Faculty of Computational Mathematics and Cybernetics, M.V. Lomonosov Moscow State University,
vladhid@mlab.cs.msu.su
- A.K. Korolyov Faculty of Computational Mathematics and Cybernetics, M.V. Lomonosov Moscow State University,
akorolev@mlab.cs.msu.su
- M.N. Krylov Faculty of Computational Mathematics and Cybernetics, M.V. Lomonosov Moscow State University,
krylovm@mlab.cs.msu.su
- I. V.Mashechkin Faculty of Computational Mathematics and Cybernetics, M.V. Lomonosov Moscow State University,
mash@cs.msu.su
- M. I. Petrovskiy Faculty of Computational Mathematics and Cybernetics, M.V. Lomonosov Moscow State University,
michael@cs.msu.su
Abstract: An overview of some effective methods of authentication using behavior models, created from keystroke dynamics
data is presented. Also, a new data representation model was proposed, a number of experiments conducted using
this model, and various algorithms of machine learing.
Keywords: wavelets; thresholding; risk estimate; normal distribution; rate of convergence
PROBLEMS OF THE ONLINE ACCESS TO SCIENTIFIC JOURNALS.
- A. V. Glushanovskii Library for Natural Sciences, Russian Academy of Sciences, avglush@benran.ru
- N. E. Kalenov Library for Natural Sciences, Russian Academy of Sciences, nek@benran.ru
Abstract: The problems of supplying with full-text scientific information access via Internet for the institutions of the Russian
Academy of Sciences (RAS) are considered. According to world practice, this task is resolved by the scientific
libraries and libraries consortia for the best financial conditions. The practice of such access organization in
Russia via Russian Foundation for Basic Research and National Electronic-information Consortia (NEICON) is
described. The statistics of using NEICON provided online journals by RAS staff is considered. Organizational
proposals for optimal decision of the task of online access to scientific information in the situation of financial
limits in RAS are suggested.
Keywords: scientific journals; full texts; Internet; remote access; libraries; consortia
DECISION SUPPORT SYSTEMS MODELING WITH SYNERGETIC ARTIFICIAL INTELLIGENCE.
- I. A. Kirikov Kaliningrad Branch of Institute of Informatics Problems, Russian Academy of Sciences, baltbipiran@mail.ru
- A. V. Kolesnikov Kaliningrad Branch of Institute of Informatics Problems, Russian Academy of Sciences, avkolesnikov@yandex.ru
- S. V. Listopad Kaliningrad Branch of Institute of Informatics Problems, Russian Academy of Sciences, ser-list-post@yandex.ru
Abstract: The approach to modeling collective effects of decision support systems within the paradigm of synergetic artificial
intelligence is considered. The model and the functional structure of the hybrid intelligent multiagent system for modeling decision support systems are proposed. The results of computational experiments that demonstrate a
positive impact of the self-organization effect on the quality of collective decisions are presented.
Keywords: decision support computer system; hybrid intelligent multiagent system with self-organization
SEMANTICS OF ASPECT-ORIENTED MODELING OF DATA AND PROCESSES.
- S. P. Kovalyov Institute of Control Problems, Russian Academy of Sciences, kovalyov@nm.ru
Abstract: An approach to semantic unification of aspect-oriented programming (AOP) technologies based on formalization
by means of category theory is presented. Aspect-oriented programming technology is represented as a category of
formal models of aspect-oriented programs and their interconnections equipped with functor of taking aspectual
structure (labeling of models by concerns). Weaving of aspect-oriented programs is formalized as certain universal
construction in this category. Formal AOP technologies applicable for reducing costs at modeling data and process
scenarios are defined and considered. Weaving existence condition for scenario models is stated and justified.
Keywords: aspect-oriented programming; category theory; aspect weaving
COGNITIVE INTEROPERABILITY OF EXPERT COLLABORATION IN THE TASK
OF THE RUSSIAN-FRENCH PARALLEL TEXTS PROCESSING:
LINGUISTIC AND COGNITIVE ASPECTS.
- O. S. Kozhunova IPI RAN, kozhunovka@mail.ru
Abstract: The resources of information and communication technologies “Refillable linguistic data base on translation
difficulties” and “Subject-oriented thesaurus of Russian-French parallel texts” are discussed. The resources are
at the design stage and to be implemented simultaneously with the Russian-French parallel corpus of belleslettres.
Apart from the functionality, linguistic and cognitive aspects of expert interaction within the task of the
Russian-French parallel texts processing through cooperative efforts are considered.
Keywords: cognitive interoperability; task of natural language processing; Russian-French parallel texts
DATA ACQUISITION SIMULATION FOR NICA EXPERIMENT.
- V. V. Korenkov Joint Institute for Nuclear Research, Laboratory of Information Technologies Dubna, korenkov@cv.jinr.ru
- A. V. Nechaevskiy Joint Institute for Nuclear Research, Laboratory of Information Technologies Dubna,
Andrey.Nechaevskiy@gmail.com
- V. V. Trofimov Joint Institute for Nuclear Research, Laboratory of Information Technologies Dubna, trofimov@jinr.ru
Abstract: The need for simulation model of data storage and processing for NICA accelerator complex is shown. The base
of the simulation model is GridSim. This paper describes an approach to simulation the dCache and network.
A simple example shows the case of the model use.
Keywords: grid technologies; grid infrastructures; data storage systems; optimization; simulation; research;
development; dCache; Tier1; NICA; Grid
ESTIMATES OF THE RATE OF CONVERGENCE OF THE DISTRIBUTIONS OF SOME RANDOM SUMS TO STABLE LAWS.
- V. Yu. Korolev Faculty of Computational Mathematics and Cybernetics, M.V. Lomonosov Moscow State University; IPI RAN,
vkorolev@cs.msu.su
- L. M. Zaks Department of Modeling and Mathematical Statistics, Alpha-Bank, lily.zaks@gmail.com
Abstract: Estimates are presented for the rate of convergence of the distributions of special sums of independent identically
distributed random variables with finite variances to symmetric strictly stable laws. The distribution of the random
index is assumed to be mixed Poisson in which the mixing distribution is a stable law concentrated on the positive
half-line. The absolute constants are written out explicitly.
Keywords: stable distribution; Berry–Esseen inequality; random sum; doubly stochastic Poisson process (Cox
process); mixed Poisson distribution
UNIVERSAL METRIC THESAURUS OF RUSSIAN LANGUAGE.
- L. A. Kuznetsov Russian Presidential Academy of National Economy and Public Administration (Lipetsk Branch),
kuznetsov.leonid48@gmail.com
- V. F. Kuznetsova Russian Presidential Academy of National Economy and Public Administration (Lipetsk Branch),
kuznetsov.leonid48@gmail.com
- A. V. Kapnin Lipetsk State Technical University, gert@inbox.ru
Abstract: All Russian language available thesauri are compiled by expert groups. In the paper, the tools for automatic
generating of a thesaurus are presented. The tools are based on a formal presentation of the texts explaining
semantics of the words and a quantify assessment of the semantic distance between the words as a measure of their
proximity. The proposed solutions allow to use the formal mathematical presentations that minimize subjectivity
in assessing the proximity of the words. The solutions give an opportunity to synthesize automatic systems for
evaluating the semantic proximity of the words and to solve other problems in the area of texts processing.
Keywords: computational linguistics; universal thesaurus; metric thesaurus; semantic proximity assessment;
semantic distance; information theory
APPROXIMATION OF A MULTIDIMENSIONAL DEPENDENCY BASED ON LINEAR EXPANSION
IN A DICTIONARY OF PARAMETRIC FUNCTIONS.
- M. G. Belyaev Institute for Information Transmission Problems RAS,Moscow Institute of Physics and Technology, Datadvance
LLC, belyaev@iitp.ru
- E. V. Bunaev Institute for Information Transmission Problems RAS,Moscow Institute of Physics and Technology, Datadvance
LLC, burnaev@iitp.ru
Abstract: The problem of a multidimensional function approximation using a finite set of pairs “point”–“function value at
this point” is considered. As amodel for the function, an expansion in a dictionary containing nonlinear parametric
functions has been used. Several subproblems should be solved when constructing an approximation based on such
model: extraction of a validation sample, initialization of parameters of the functions from the dictionary, and
tuning of these parameters. Efficient methods for solving these subproblems have been suggested. Efficiency of the
proposed approach is demonstrated on some problems of engineering design.
Keywords: nonlinear approximation; parametric dictionaries
|