Fault tolerance insuring technique for communication network of multidimensional torus topology


DOI: 10.34759/trd-2020-114-17

Аuthors

Simonov A. S.*, Zhabin I. A.**

Scientific Research Center of Electronic Computing Technology, 125, Varshavskoye shosse, Moscow, 117587, Russia

*e-mail: simonov@nicevt.ru
**e-mail: zhabin@nicevt.ru

Abstract

The article describes a method for fault tolerance ensuring of a communication network with a multidimensional torus topology, tested while developing the Angara high-speed interconnect router. The high-speed data transmission is associated with a number of difficulties that should be overcome by the hardware designer to achieve acceptable fault tolerance. Problematics of the issue becomes especially acute with an increase in the number of nodes in the system, since the probability of failures is greatly enhanced with expansion in the number of connections.

The work on the fault tolerance ensuring was performed in the following areas:

– quality improving of the transmitted signal by parameters fine-tuning of the high-speed transceivers;

– automated control of the parameters of the signals eye diagram and optimal parameters selection of the transmitting and receiving sides of the data transmission channel;

– developing a fault-tolerant link integrity control algorithm with the ability of the damaged data retransmission;

– developing a data flow control mechanism to control the credit information of the virtual channels;

– developing network layer algorithms for network state monitoring, and quick routing tables rebuilding.

The algorithms and mechanisms described in the article continuity ensuring f the transmitted data flow and their guaranteed delivery over the communication lines between the nodes, exclusion of losses, duplications and distortions of packets during transmission. The conducted measurements have shown that the bit error rate (BER) was about 10-12, which corresponded to the permissible values.

The methods described in the article allow obtaining the availability factor of more than 0.99 for large computing systems. At present, they are implemented in the equipment of the serially produced Angara network equipment, and are also widely applied in the design of the second generation Angara high speed interconnect router.

Keywords:

fault tolerance, communication network

References

  1. Chao H.J., Liu B. High Performance Switches and Routers, Wiley-IEEE Press, Hoboken, NJ, 2007, 640 p.

  2. Simonov A.S., Semenov A.S., Makagon D.V. Trudy MAI, 2019, no. 108. URL: http://trudymai.ru/published.php?ID=109525. DOI: 10.34759/trd-2019-108-14

  3. Slutskin A.I., Simonov A.S., Zhabin I.A., Makagon D.V., Syromyatnikov E.L. Uspekhi sovremennoi radioelektroniki, 2012, no. 1, pp. 6 – 10.

  4. Zhabin I.A., Makagon D.V., Polyakov D.A., Simonov A.S., Syromyatnikov E.L., Shcherbak A.N. Naukoemkie tekhnologii, 2014, vol. 15, no. 1, pp. 21 – 27.

  5. Zhabin I.A., Makagon D.V., Simonov A.S., Syromyatnikov E.L., Frolov A.S., Shcherbak A.N. Superkomp’yutery, 2013, no. 4 (16), pp. 46 – 49.

  6. Brekhov O.M., Balyan A.V. Trudy MAI, 2016, no. 89. URL: http://trudymai.ru/eng/published.php?ID=73394

  7. Brekhov O.M., Balyan A.V. Trudy MAI, 2016, no. 81. URL: http://trudymai.ru/eng/published.php?ID=57913

  8. Romanov A.M. Trudy MAI, 2020, no. 111. URL: http://trudymai.ru/eng/published.php?ID=115194. DOI: 10.34759/trd-2020-111-19

  9. Budruk R., Anderson D., Shanley T. PCI Express System Architecture, Addison-Wesley Professional, Boston, MA, 2003, 1120 p.

  10. Winkles J. Sizing of the Replay Buffer in PCI Express Devices, MindShare, Inc., Cedar Park, TX, 2003, 12 p.

  11. Bing Li B., Ding Y., Liu Y. Circuit Design of PCI Express Retry Mechanisms, WSEAS Transactions on Circuits and Systems, 2014, vol. 13, Art. no. 17, pp. 165 – 174

  12. Aguilar M., Veloz A., Gumin M. Proposal of Implementation of the “Data Link Layer” of PCI-Express, 1st International Conference on Electrical and Electronics Engineering, IEEE, 2004, pp. 65 – 69.

  13. Slepov N.N. Elektronika: Nauka, Tekhnologiya, Biznes, 2002, no. 5, pp. 22 – 31.

  14. Kaddoum G., Chargé P., Roviras D., Fournier-Prunaret D. Analytical calculation of BER in communication systems using a piecewise linear chaotic map, 18th European Conference on Circuit Theory and Design, IEEE, 2007, pp. 691 – 694.

  15. Anatomy of an Eye Diagram: Application Note № 11410-0053, Anritsu Company, 2010, 20 p.

  16. Dmitriev-Zdorov V., Miller M.T., Ferry C. The Jitter-Noise Duality and Anatomy of an Eye Diagram, UBM Electronics, DesignCon 2014: Where the Chip Meets the Board, 2014, 25 p.

  17. Dally W.J., Brian T. Principles and Practices of Interconnection Networks, Morgan Kaufmann, San Francisco, CA, 2004, 550 p.

  18. Duato J., Yalamanchili S., Ni L. Interconnection Networks: An Engineering Approach, Morgan Kaufmann, San Francisco, CA, 2003. p. 650.

  19. Duato J., Robles A., Silla F., Beivide R. A Comparison of Router Architectures for Virtual Cut-Through and Wormhole Switching in a NOW Environment, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing, IEEE, 1999, pp. 1 – 8.

  20. Alnajjar D., Suguiy M. A Comprehensive Guide for CRC Hardware Implementation, 2015. URL: https://www.researchgate.net/publication/282133684_A_Comprehensive_Guide_for_CRC_Hardware_Implementation.html

  21. Lu W., Wong S. A Fast CRC Update Implementation, IEEE Workshop on High Performance Switching and Routing, 2003, pp. 113 – 120.

  22. Zanwar C., Patil R.A., Godbole P.D. Routing Algorithms for Interconnection Networks: A Review, 2016 International Conference on Information, Communication and Computing Technology, 2016, pp. 1 – 5.

  23. Ferdaus J., Salihi R. Routing: Internet Routing Protocols and Algorithms, 2015. URL: https://www.researchgate.net/publication/281490293_Routing_Internet_Routing_Protocols_and_Algorithms/citation/download.html


Download

mai.ru — informational site MAI

Copyright © 2000-2024 by MAI

Вход