Hyytia, E., R. Righter, J. Virtamo, and L. Viitasaari. 2020. On value functions for FCFS queues with batch arrivals and general cost structures. Perform. Evaluation 138:102083. 29 p.
Hyytia, E. 2013. Optimal routing of fixed size jobs to two parallel servers. INFOR 51(4):215-224.
Konovalov, M. G., and R. V. Razumchik. 2015. Obzor modeley i algoritmov razmeshcheniya zadaniy v sistemakh s parallel'nym obsluzhivaniem [Methods and algorithms for job scheduling in systems with parallel service: A survey]. Informatika i ee Primeneniya - Inform. Appl. 9(4):56-67.
Konovalov, M., and R. Razumchik. 2015. Iterative algorithm for threshold calculation in the problem of routing fixed size jobs to two parallel servers. J. Telecommun. Inf. Technol. 3:32-38.
Hyytia, E. 2013. Lookahead actions in dispatching to parallel queues. Perform. Evaluation 70(10):859-872.
Walrand, J. 1989. An introduction to queueing networks. Englewood Cliffs, NJ: Prentice Hall. 384 p.
Chow, C.-S. 1989. Multigrid algorithms and complexity results for discrete-time stochastic control and related fixed-point problems. Campbridge, MA: MIT. PhD The- sis.162 p.
Rust, J. 1997. Using randomization to break the curse of dimensionality. Econometrica 65(3):487-516.
Guihenneuc-Jouyaux, C., and C. P Robert. 1998. Discretization of continuous Markov chains and Markov chain Monte Carlo convergence assessment. J. Am. Stat. Assoc. 93(443):1055-1067.
Puterman, M. L., and S. Brumelle. 1979. On the convergence of policy iteration in stationary dynamic programming. Math. Oper. Res. 4(1):60-69.
Antos, A., R. Munos, and C. Szepesvari. 2007. Fitted Q-iteration in continuous action-space MDPs. 20th Conference (International) on Neural Information Processing Systems Proceedings. Red Hook, NY: Curran Associates Inc. 9-16.
Zhang, S., Y. Wan, R. Sutton, and S. Whiteson. 2021. Average-reward off-policy policy evaluation with function approximation. Available at: https://arxiv.org/abs/ 2101.02808 (accessed July 7, 2021).
Bertsekas, D. P. 1994. A counter-example to temporal differences learning. Neural Comput. 7:270-279.
Cooper, W., S. Henderson, and M. Lewis. 2003. Convergence of simulation-based policy iteration. Probab. Eng. Inform. Sc. 17:213-234.
Kveton, B., and M. Hauskrecht. 2004. Heuristic refinements of approximate linear programming for factored continuous-state Markov decision processes. 14th Conference (International) on International Conference on Automated Planning and Scheduling Proceedings. Menlo Park, CA: AAAI Press. 306-314.
Cao, X. 2009. Stochastic learning and optimization - a sensitivity-based approach. Annu. Rev. Control 33(1):11- 24.
Samuelsson, S. G., and E. Hyytia. 2018. Applying re-inforcement learning to basic routing problem. Queueing theory and network applications. Eds. Y. Takahashi, T Phung-Duc, S. Wittevrongel, and W. Yue. Lecture notes in computer science ser. Springer. 10932:238-249.
Sragovich, V. G. 1981. Adaptivnoe upravlenie [Adaptive control]. Moscow: Nauka. 381 p.
Konovalov, M. G. 2007. Metody adaptivnoy obrabotki informatsii i ikh prilozheniya [Methods of adaptive information processing and their applications]. Moscow: IPI RAN. 212 p.
Uther, W.T.B., and M. M. Veloso. 1998. Tree based discretization for continuous state space reinforcement learning. 15th National/10th Conference on Artificial In-telligence/Innovative Applications of Artificial Intelligence Proceedings. Menlo Park, CA: AAAI Press. 769-774.
Ortner, R. 2013. Adaptive aggregation for reinforcement learning in average reward Markov decision processes. Ann. Oper. Res. 208:321-336.

Title

ROUTING JOBS TO HETEROGENEOUS PARALLEL QUEUES USING DISTRIBUTED POLICY GRANDIENT ALGORITHM

Journal

Informatics and Applications
2021, Volume 15, Issue 3, pp 41-50

Cover Date

2021-09-30

DOI

10.14357/19922264210306

Print ISSN

1992-2264

Publisher

Institute of Informatics Problems, Russian Academy of Sciences

Additional Links

Key words

heterogeneous parallel queues; Markov chains with continuous state space; sojourn time optimization

Authors

M. G. Konovalov

and R. V. Razumchik

Author Affiliations

Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Informatics and Applications