Informatics and Applications
2021, Volume 15, Issue 3, pp 41-50
ROUTING JOBS TO HETEROGENEOUS PARALLEL QUEUES USING DISTRIBUTED POLICY GRANDIENT ALGORITHM
- M. G. Konovalov
- R. V. Razumchik
Abstract
The problem of dispatching to heterogeneous servers, operating independently in parallel, is considered.
Each server has a single processor and a dedicated FIFO (first in, first out) queue of infinite capacity. Homogeneous jobs (without preceding constraints) arrive one by one to the dispatcher which immediately makes a routing decision. Both jobs interarrival times and their sizes are assumed to be independent and identically distributed random variables with general distributions. Upon making a decision, full information about the current system's state, including the arrivingjob size, is available to the dispatcher. The problem is to minimize the long-run system's mean response time. A new sample-path-based policy gradient algorithm is proposed which allows one to construct such a policy. Its main ingredients are the dynamically changing discretization of the continuous state space and individual policy gradient algorithms acting in each cell. Simple numerical examples are given which demonstrate that the new algorithm can outperform best known solutions and is applicable in quite general cases.
[+] References (21)
- Hyytia, E., R. Righter, J. Virtamo, and L. Viitasaari.
2020. On value functions for FCFS queues with batch arrivals and general cost structures. Perform. Evaluation 138:102083. 29 p.
- Hyytia, E. 2013. Optimal routing of fixed size jobs to two parallel servers. INFOR 51(4):215-224.
- Konovalov, M. G., and R. V. Razumchik. 2015. Obzor modeley i algoritmov razmeshcheniya zadaniy v sistemakh s parallel'nym obsluzhivaniem [Methods and algorithms for job scheduling in systems with parallel service: A survey]. Informatika i ee Primeneniya - Inform. Appl. 9(4):56-67.
- Konovalov, M., and R. Razumchik. 2015. Iterative algorithm for threshold calculation in the problem of routing fixed size jobs to two parallel servers. J. Telecommun. Inf. Technol. 3:32-38.
- Hyytia, E. 2013. Lookahead actions in dispatching to parallel queues. Perform. Evaluation 70(10):859-872.
- Walrand, J. 1989. An introduction to queueing networks. Englewood Cliffs, NJ: Prentice Hall. 384 p.
- Chow, C.-S. 1989. Multigrid algorithms and complexity results for discrete-time stochastic control and related fixed-point problems. Campbridge, MA: MIT. PhD The- sis.162 p.
- Rust, J. 1997. Using randomization to break the curse of dimensionality. Econometrica 65(3):487-516.
- Guihenneuc-Jouyaux, C., and C. P Robert. 1998. Discretization of continuous Markov chains and Markov chain Monte Carlo convergence assessment. J. Am. Stat. Assoc. 93(443):1055-1067.
- Puterman, M. L., and S. Brumelle. 1979. On the convergence of policy iteration in stationary dynamic programming. Math. Oper. Res. 4(1):60-69.
- Antos, A., R. Munos, and C. Szepesvari. 2007. Fitted Q-iteration in continuous action-space MDPs. 20th Conference (International) on Neural Information Processing Systems Proceedings. Red Hook, NY: Curran Associates Inc. 9-16.
- Zhang, S., Y. Wan, R. Sutton, and S. Whiteson. 2021. Average-reward off-policy policy evaluation with function approximation. Available at: https://arxiv.org/abs/ 2101.02808 (accessed July 7, 2021).
- Bertsekas, D. P. 1994. A counter-example to temporal differences learning. Neural Comput. 7:270-279.
- Cooper, W., S. Henderson, and M. Lewis. 2003. Convergence of simulation-based policy iteration. Probab. Eng. Inform. Sc. 17:213-234.
- Kveton, B., and M. Hauskrecht. 2004. Heuristic refinements of approximate linear programming for factored continuous-state Markov decision processes. 14th Conference (International) on International Conference on Automated Planning and Scheduling Proceedings. Menlo Park, CA: AAAI Press. 306-314.
- Cao, X. 2009. Stochastic learning and optimization - a sensitivity-based approach. Annu. Rev. Control 33(1):11- 24.
- Samuelsson, S. G., and E. Hyytia. 2018. Applying re-inforcement learning to basic routing problem. Queueing theory and network applications. Eds. Y. Takahashi, T Phung-Duc, S. Wittevrongel, and W. Yue. Lecture notes in computer science ser. Springer. 10932:238-249.
- Sragovich, V. G. 1981. Adaptivnoe upravlenie [Adaptive control]. Moscow: Nauka. 381 p.
- Konovalov, M. G. 2007. Metody adaptivnoy obrabotki informatsii i ikh prilozheniya [Methods of adaptive information processing and their applications]. Moscow: IPI RAN. 212 p.
- Uther, W.T.B., and M. M. Veloso. 1998. Tree based discretization for continuous state space reinforcement learning. 15th National/10th Conference on Artificial In-telligence/Innovative Applications of Artificial Intelligence Proceedings. Menlo Park, CA: AAAI Press. 769-774.
- Ortner, R. 2013. Adaptive aggregation for reinforcement learning in average reward Markov decision processes. Ann. Oper. Res. 208:321-336.
[+] About this article
Title
ROUTING JOBS TO HETEROGENEOUS PARALLEL QUEUES USING DISTRIBUTED POLICY GRANDIENT ALGORITHM
Journal
Informatics and Applications
2021, Volume 15, Issue 3, pp 41-50
Cover Date
2021-09-30
DOI
10.14357/19922264210306
Print ISSN
1992-2264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
heterogeneous parallel queues; Markov chains with continuous state space; sojourn time optimization
Authors
M. G. Konovalov and R. V. Razumchik
Author Affiliations
Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|