Network scientific periodic publication

Estimation of the hardware complexity of a device for multiplying square binary matrices with pipelining of data reading operations from specialized multiport memory

2025, No. 143

Аuthors

Bolgak A. V.^*, Vatutin E. I.^**

,

*e-mail: aleksei.bolgack@yandex.ru
**e-mail: evatutin@rambler.ru

Abstract

This article discusses the application areas of matrix calculations. It describes approaches to performing matrix multiplication. It considers the main methods for optimizing matrix processing at the software and hardware levels. It considers the main types of digital devices based on the principle of parallel-pipeline processing of information. It proposes a systolic device for fast multiplication of square binary matrices of size n × n, the distinctive feature of which is the pipelining of the data reading operation from a specialized multiport memory. It evaluates the hardware complexity of the proposed device and compares it with the hardware complexity of a prototype device based on systolic structures, in which the corresponding structural and functional organization of multiport memory was proposed, providing reading of 2n pairs of matrix coefficients each cycle, which is significantly better than classical memory (for example, DDR or GDDR), providing reading of only one operand per cycle. During the evaluation of its performance, it was found that with an increase in the size of the processed matrices n > 64, the device operating time (pipeline cycle) is still limited by the rate of data receipt from the memory. The results obtained showed that most of the equivalent gates are spent on the implementation of specialized memory, and also that the device for multiplying binary matrices with pipelining the data reading operation from specialized multiport memory has 5.5 – 8.8 times greater hardware complexity compared to the prototype device depending on the matrix size n with a decrease in the processing time of binary matrices of size n < 2000 to 200 times, which is appropriate for the practical implementation of the proposed device using FPGA or ASIC.

Keywords:

matrix multiplication, performance evaluation, multiported specialized memory, specialized computing facilities, systolic computing facilities

References

Grigor'ev E.K., Sergeev A.M. Method for security coding of data received by optical sensors of unmanned aircraft systems. Trudy MAI. 2023. No. 133. (In Russ.). URL: https://trudymai.ru/eng/published.php?ID=177675
Nikulin V.S., Khizhnyakov YU.N., Storozhev S.A. Virtual adaptive vector-matrix meter of the oxidizer of the combustion chamber of a gas turbine engine. Trudy MAI. 2021. No. 121. (In Russ.). URL: https://trudymai.ru/eng/published.php?ID=162668. DOI: 10.34759/trd-2021-121-21
Goncharova V.I. Parametric synthesis of a nonlinear automatic control system with distributed parameters. Trudy MAI. 2024. No. 134. (In Russ.). URL: https://trudymai.ru/eng/published.php?ID=178476
Sbiten'kova M.A. Hybrid inertial navigation system. Trudy MAI. 2011. No. 45. (In Russ.). URL: https://trudymai.ru/eng/published.php?ID=25509&PAGEN_2=2
Saenko I.B., Mityakov E.S., Lauta O.S., Sokolov A.P. Algorithm of swarm control of UAVs with elements of cluster analysis. Informatsiya i kosmos. Seriya: Informatsionnye tekhnologii i telekommunikatsii. 2024. No. 4. p. 68-75. (In Russ.)
Volkov A.S., Baskakov A.E. Bidirectional search procedure development for solving the the transport software-defined network routing problem. Trudy MAI. 2021. No. 118. (In Russ.). URL: https://trudymai.ru/eng/published.php?ID=158240. DOI: 10.34759/trd-2021-118-07 c
Zykov A.A. Osnovy teorii grafov (Fundamentals of graph theory). Moscow: Nauka Publ., 1986. 384 p.
Vatutin E.I., Zotov I.V. Construction of a matrix of relations in the problem of optimal partitioning of parallel control algorithms. Izvestiya Kurskogo gosudarstvennogo tekhnicheskogo universiteta. 2004. No. 2. P. 85–89. (In Russ.)
Bolgak A.V., Vatutin E.I. Evaluation of the real performance of Intel Core processors of various generations in the task of multiplying real matrices for single-threaded software implementation. 4-aya mezhdunarodnaya nauchno-tekhnicheskaya konferentsiya «Oblachnye i raspredelennye vychislitel'nye sistemy v elektronnom upravlenii» - ORVS–2023: sbornik trudov. Kursk: Izd-vo «Universitetskaya kniga» Publ., 2024. P. 98–100.
Vatutin E.I., Titov V.S. Evaluation of the real performance of modern processors in the matrix multiplication problem for single-threaded software implementation using the SSE extension (part 1). Izvestiya Yugo-Zapadnogo gosudarstvennogo universiteta. 2015. V. 1, No. 4 (61). P. 26–35. (In Russ.)
Vatutin E.I., Titov V.S. Vatutin E.I., Titov V.S. Evaluation of the real performance of modern processors in the matrix multiplication problem for single-threaded software implementation using the SSE extension (part 2). Izvestiya Yugo-Zapadnogo gosudarstvennogo universiteta.. 2015. V. 1, No. 5 (62). P. 8–16. (In Russ.)
Vatutin E.I., Martynov I.A., Titov V.S. Evaluation of the actual performance of modern graphics cards with support for CUDA technology in the matrix multiplication problem. Izvestiya Yugo-Zapadnogo gosudarstvennogo universiteta. Seriya: Upravlenie, vychislitel'naya tekhnika, informatika. Meditsinskoe priborostroenie. 2014. No. 2. P. 8–17. (In Russ.)
Boreskov A.V., Kharlamov A.A. Markovskii N.D. et al. Parallel'nye vychisleniya na GPU. Arkhitektura i programmnaya model' CUDA (Parallel computing on the GPU. CUDA Architecture and Software model). Moscow: Izd-vo Moskovskogo universiteta Publ., 2012. 336 p.
Starovoitov I.N., Revnyakov E.N., Polyakova E.N. Parallel computing on graphics processors. Pervaya Mezhdunarodnaya nauchnaya konferentsiya po problemam tsifrovizatsii: EDCRUNCH URAL – 2020: materialy konferentsii. Ekaterinburg: Izd-vo Ural'skogo universiteta Publ., 2020. P. 314–319.
Grebnev A.K., Gridin V.N., Dmitriev V.P. Optoelektronnye elementy i ustroistva (Optoelectronic Elements and Devices). Moscow: Radio i svyaz' Publ., 1998. 336 p.
Yushin A.M. Optoelektronnye pribory i ikh zarubezhnye analogi. T.1. (Optoelectronic Devices and Their Foreign Analogues. V.1). Moscow: Radiosoft Publ., 1998. 512 p.
Belov P.A., Bespalov V.G., Vasil'ev V.N. et al. Opticheskie protsessory: dostizheniya i novye idei / V kn.: Problemy kogerentnoi i nelineinoi optiki (Optical processors: achievements and new ideas. In the book: Problems of coherent and nonlinear optics). Saint Petersburg: Universitet ITMO Publ., 2006. P. 6–36.
Plaksienko V.S., Plaksienko N.E., Plaksienko S.V. Ustroistva priema i obrabotki signalov (Signal reception and processing devices). Moscow: Uchebno-metodicheskii izdatel'skii tsentr «Uchebnaya literatura» Publ., 2004. 376 p.
Odinets A.I., Naumenko A.P. Tsifrovye ustroistva: ATSP i TSAP (Digital devices: ADC and DAC). Omsk: Izd-vo IRSID Publ., 2006. 48 p.
Kun S. Matrichnye protsessory na SBIS (Matrix processors on VLSI). Moscow: Mir Publ., 1991. 672 p.
Strogonov A.V. Osnovy tsifrovoi obrabotki signalov (Fundamentals of digital signal processing). Voronezh: Voronezhskii gosudarstvennyi tekhnicheskii universitet Publ., 2014. 310 p.
Gvozdeva S.N., Vatutin E.I., Titov V.S. Evaluation of the performance of a device with a systolic structure for multiplying binary matrices. Telekommunikatsii. 2020. No. 3. P. 2–10. (In Russ.)

Download

mai.ru — informational site MAI

Вход