Systems and Means of Informatics

2014, Volume 24, Issue 3, pp 44-62

FUSED MULTIPLY-ADD: METHODOLOGICAL ASPECTS

I. Sokolov
Y. Stepchenkov
S. Bobkov
Y. Rogdestvenski
Y. Diachenko

Abstract

The paper presents approaches to designing self-timed (ST) equipment and analyzes conditions of in-system integration of synchronous and ST units in a supercomputer network taking the ST Fused Multiply-Add (FMA) unit as an example. Self-timed FMA complies with the IEEE 754 Standard and performs either one double precision FMA operation or one or two single precision operations simultaneously under three operands. It utilizes the ST-ternary encoding and the 65-nanometer CMOS (complementary metal-oxide-semiconductor) technology as the implementation basis. Depending on realization, it works with asynchronous or synchronous environment and provides not less than 1 GFlops performance with latency not more than 6 ns with respect to input data arrival.

[+] References (19)

Semenov, Yu. A. Superkomp'yutery i Watson [Supercomputers and Watson]. Available at: http://book.itep.ru/10/supercomp.htm (accessed August 15, 2014).
Diachenko, Y.G., Y. V. Rogdestvenski, N. V. Morozov, and D.Y. Stepchenkov. 2008. Kvazisamosinhronnyy vychislitel': Prakticheskaya realizatsiya [Quasi self-timed coprocessor: Practical implementation]. Tr. Mezhdunar. konf. "Problemy Razrabotki Perspektivnykh Mikro- i Nanojelektronnykh Sistem" [Problems of the Perspective Micro- and Nanoelectronic Systems Development Conference Proceedings]. Moscow. 435-440.
Stepchenkov, Y. A., Y. G. Diachenko, and S. G. Bobkov. 2008. Kvazisamosinkhron- nyy vychislitel': Metodologicheskie i algoritmicheskie aspekty [Quasi-self-timed coprocessor: The methodological aspects]. Tr. Mezhdunar. konf. "Problemy Razrabotki Perspektivnykh Mikro- i Nanoelektronnykh Sistem" [Problems of the Perspective Micro- and Nanoelectronic Systems Development Conference Proceedings]. Moscow. 441-446.
Bink, A., and R. York. 2007. ARM996HS: The first licensable, Clockless 32-bit processor core. IEEE Micro. 27(2):58-68.
Bing, H. Acceleration of spiking neural network on general purpose graphics processor. Ph.D. Dissertation. University of Dayton, 2010. 43 p.
IEEE Computer Society. 2008. IEEE Standard for Floating-Point Arithmetic 7542008. doi:10.1109/IEEE STD. 2008.4610935.
Pillai, R. V. K., S. Y. A. Shah, A. J. Al-Khalili, and D. Al-Khalili. 2001. Low power floatingpoint MAFs - a comparative study. 6th Symposium (International) on Signal Processing and Its Applications Proceedings. 1:284-287.
Seidel, P.-M. 2003. Multiple path IEEE floating-point fused multiply-add. 46th IEEE Midwest Symposium (International) on Circuits and Systems Proceedings. 1359-1362.
Noche, J. R., and J. C. Araneta. 2007. An asynchronous IEEE floating-point arithmetic unit. Proc. Sci. Diliman 19(2): 12-22.
Manohar, R., andB. R. Sheikh. May 2013. Operand-optimized asynchronous floatingpoint units and method of use therefore. U.S. Patent No. 20130124592.
Rutkevich, A., A. Bumagin, A. Gondar', et al. 2009. Metody snizheniya ener- gopotrebleniya v strogo samosinkhronnykh mikroprocessornykh skhemakh [Methods of decreasing energy consumption in strongly self-timed microprocessors]. Komponenty i Tehnologii [Components and Technologies] 9:109-114.
Smith, S. C., andJ. Di. 2009. Designing asynchronous circuits using NULL Convention Logic (NCL). Synthesis Lectures on Digital Circuits and Systems 4(1):61-73.
Sokolov, I. A., Y. A. Stepchenkov, S. G. Bobkov, V.N. Zakharov, Y. G. Diachenko, Y. V. Rogdestvenski, and A. V. Surkov. 2014. Bazis realizatsii super-EVM eksaflop- snogo klassa [Implementation basis of ExaFlops class supercomputer]. Informatika i ee Primenenija - Inform. Appl. 8(1):45-70.
Stepchenkov, Y. A., V. S. Petruhin, and Y. G. Diachenko. 2005. Opyt razrabotki samosinkhronnogo yadra mikrokontrollera na bazovom matrichnom kristalle [The experience in microcontroller's self-timed core design on FPGA]. Tr. Mezhdunar. konf. "Problemy Razrabotki Perspektivnykh Mikro- i Nanojelektronnykh Sistem" [Problems of the Perspective Micro- and Nanoelectronic Systems Development Conference Proceedings]. Moscow. 235-242.
Galal, S., and M. Horowitz. 2011. Energy-efficient floating-point unit design. IEEE Trans. Comput. 60(7):913-922.
Brillouin, L. 1956. Science and information theory. New York: Academic Press Publs. 350 p.
Varshavsky, V.I., ed. 1990. Self-timed control of concurrent processes: The design of aperiodic logical circuits in computers and discrete systems. Mathematics and its applications ser. Dordrecht, The Netherlands: Kluwer Academic Publs. 418 p.
Stepchenkov, Y. A., Y. V. Rogdestvenski, Y. G. Diachenko, N. V. Morozov, and D. Y. Stepchenkov. 2014. Samosinkhronnyy umnozhitel' s nakopleniem: Praktiche- skaya realizatsiya [Self-timed fused multiply-add unit: Practical implementation]. Sistemy i Sredstva Informatiki - The Systems and Means of Informatics 24(3):63- 77.
Bewick, G. W. 1994. Fast multiplication: Algorithms and Implementation. Ph.D. Dissertation. Stanford University. 155 p.

[+] About this article