Network scientific periodic publication

Application of neural network-based semantic segmentation in resource-constrained real-time computer vision systems

2026, No. 146

Аuthors

Korytkin N. G.

Lomonosov Moscow State University, 1, Leninskie Gory, Moscow, 119991, Russia

e-mail: korytkinng@my.msu.ru

Abstract

This paper investigates two neural network architectures for real-time semantic image segmentation based on DeepLabv3+, employing modified MobileNetV3-Small and ResNet50 models as backbone encoders. For unmanned ground vehicles (UGVs), mobile robotic systems, and aerospace applications operating in complex and dynamic environments, achieving high segmentation accuracy and real-time processing performance is critically important. The encoders were modified by removing the classification layers while retaining the convolutional feature extraction layers, enabling their integration into the DeepLabv3+ decoder module. As a result, two architectures with different computational complexities were developed, each designed for specific hardware platforms. Experimental evaluation was conducted on two test platforms: a desktop system equipped with an AMD Ryzen 5 3600 CPU and an NVIDIA GeForce RTX 3050 discrete GPU, and a laptop featuring an AMD Ryzen 7 5700U mobile processor with integrated graphics. Training and validation were performed on the Yamaha-CMU Off-Road (YCOR) dataset using mIoU, Pixel Accuracy, and Mean Accuracy as evaluation metrics. The model with the MobileNetV3-Small encoder demonstrated superior segmentation accuracy (mIoU = 55.56%) compared to the ResNet50-based variant (mIoU = 49.30%). At the same time, the ResNet50 architecture achieved higher processing speed when executed on a discrete GPU. With hardware acceleration enabled, both models reached processing speeds of at least 30 frames per second for 1920×1080 video sequences. The scientific contribution of this work lies in a detailed comparative analysis of two modified DeepLabv3+ architectures under conditions approximating real-world deployment of mobile robotic systems. The influence of hardware platform type on the trade-off between segmentation accuracy and processing speed is demonstrated. Based on the obtained results, practical recommendations for selecting an appropriate architecture for embedded and high-performance systems are formulated.

Keywords:

semantic segmentation, DeepLabv3+, MobileNetV3, ResNet50, real-time, robotics.

References

Ol'kina D.S. Algoritm semanticheskoi segmentatsii izobrazhenii dlya resheniya zadachi pozitsionirovaniya letatel'nogo apparata na zemnoi poverkhnosti // Trudy MAI. 2023. № 130. DOI: 10.34759/trd-2023-130-18
Tonkikh A.N. Primenenie neirosetevykh tekhnologii dlya raspoznavaniya raspredelennykh ob"ektov na radiolokatsionnykh izobrazheniyakh // Trudy MAI. 2025. № 141. URL: https://trudymai.ru/published.php?ID=184504
Mit'kin M.A., Gavrilov K.Yu. Primenenie iskusstvennykh neironnykh setei dlya vosstanovleniya ob"ektov na radiolokatsionnykh izobrazheniyakh // Trudy MAI. 2025. № 141. URL: https://trudymai.ru/published.php?ID=184505
Komp'yuternoe zrenie [Elektronnyi resurs] / L.Shapiro, Dzh. Stokman ; per. s angl. 2-e izd. (el.). M. : BINOM. Laboratoriya znanii, 2013. 752 s. : il.
Image Thresholding // OpenCV URL: https://docs.opencv.org/4.x/d7/d4d/tutorial_py_thresholding.html (accessed: 17.02.2026).
Canny Edge Detection // OpenCV URL: https://docs.opencv.org/4.x/da/d22/tutorial_py_canny.html (accessed: 17.02.2026).
J. Long, E. Shelhamer and T. Darrell, "Fully convolutional networks for semantic segmentation," in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015 pp. 3431-3440. doi: 10.1109/CVPR.2015.7298965
H. Noh, S. Hong and B. Han, "Learning Deconvolution Network for Semantic Segmentation," in 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015 pp. 1520-1528. doi: 10.1109/ICCV.2015.178
Badrinarayanan, Vijay & Kendall, Alex & Cipolla, Roberto. (2017). SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. doi: https://doi.org/10.17863/CAM.17966
Ronneberger, O., Fischer, P., Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science(), vol 9351. Springer, Cham. https://doi.org/10.1007/978-3-319-24574-4_28
Chen, Liang-Chieh & Papandreou, George & Kokkinos, Iasonas & Murphy, Kevin & Yuille, Alan. (2015). Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs.
Chen, Liang-Chieh & Papandreou, George & Kokkinos, Iasonas & Murphy, Kevin & Yuille, Alan. (2016). DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence. PP. 10.1109/TPAMI.2017.2699184.
Chen, Liang-Chieh & Papandreou, George & Schroff, Florian & Adam, Hartwig. (2017). Rethinking Atrous Convolution for Semantic Image Segmentation.
Chen, LC., Zhu, Y., Papandreou, G., Schroff, F., Adam, H. (2018). Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science(), vol 11211. Springer, Cham. https://doi.org/10.1007/978-3-030-01234-2_49
P. N. Hadinata, D. Simanta, L. Eddy, and K. Nagai, “Crack Detection on Concrete Surfaces Using Deep Encoder-Decoder Convolutional Neural Network: A Comparison Study Between U-Net and DeepLabV3+,” Journal of the Civil Engineering Forum, vol. 7, no. 3, p. 323, Aug. 2021, doi: https://doi.org/10.22146/jcef.65288.
A. Howard and M. Sandler and B. Chen and W. Wang and L. Chen and M. Tan and G. Chu and V. Vasudevan and Y. Zhu and R. Pang and H. Adam and Q. Le Searching for MobileNetV3 // 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Los Alamitos, CA, USA: IEEE Computer Society, 2019. С. 1314-1324.
He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian Deep Residual Learning for Image Recognition // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. С. 770-778.
M. Tan and Q. V. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” arXiv.org, Sep. 11, 2020. http://arxiv.org/abs/1905.11946
M. Tan and Q. V. Le, “EfficientNetV2: Smaller Models and Faster Training,” arxiv.org, Apr. 2021, doi: https://doi.org/10.48550/arXiv.2104.00298.
T. Shahriar, “Comparative Analysis of Lightweight Deep Learning Models for Memory-Constrained Devices,” arXiv.org, 2025. https://arxiv.org/abs/2505.03303v2
S. Hu, J. Liu, and Z. Kang, “DeepLabV3+/Efficientnet Hybrid Network-Based Scene Area Judgment for the Mars Unmanned Vehicle System,” Sensors (Basel, Switzerland), vol. 21, no. 23, p. 8136, Dec. 2021, doi: https://doi.org/10.3390/s21238136.
Maturana, Daniel and Chou, Po-Wei and Uenoyama, Masashi and Scherer, Sebastian Real-time semantic mapping for autonomous off-road navigation // Field and Service Robotics. 2018. С. 335-350.
Yamaha-CMU Off-Road Dataset Converter to ADE20K Format // Github URL: https://gist.github.com/GerardMaggiolino/258a65077d43d4e176e0fb0240a49edb (accessed: 03.03.2025).

Download

mai.ru — informational site MAI

Вход