COMPARATIVE ANALYSIS OF REAL-TIME SEMANTIC SEGMENTATION ALGORITHMS
DOI:
https://doi.org/10.31891/csit-2024-4-11Keywords:
semantic segmentation, real-time image processing, neural networks, machine learning, deep learningAbstract
Semantic segmentation is a fundamental task in computer vision that enables machines to interpret and understand images at the pixel level, providing a deeper understanding of scene composition. By assigning a class to each pixel, this technique is critical for applications requiring detailed visual comprehension, such as autonomous driving, robotics, medical imaging, and augmented reality. This article presents a comprehensive comparative analysis of deep learning models specifically designed for real-time semantic segmentation, focusing on their performance metrics, architectures, and various application contexts. This study compares advanced deep learning models, including PIDNet, PP-LiteSeg, BiSeNet, SFNet, and others, using key metrics such as Mean Intersection over Union (mIoU) and Frames Per Second (FPS), alongside the hardware specifications on which they were tested. Models like PIDNet, known for its multi-branch architecture, emphasize detailed, context, and boundary information to improve segmentation precision without sacrificing speed. On the other hand models like PP-LiteSeg, with its Short-Term Dense Concatenate Network (STDCNet) backbone, excels in reducing computational complexity while maintaining competitive accuracy and inference speed, making it well-suited for resource-constrained environments. The analysis evaluates the trade-offs between accuracy and computational efficiency using benchmark datasets such as Cityscapes and DeepScene. Additionally, we examine the adaptability of these models to diverse operational scenarios, particularly on edge devices like NVIDIA Jetson Nano, where computational resources are limited. This discussion extends to the challenges faced in real-time implementations, including maintaining robustness across varying environments and achieving high performance with minimal latency. Highlighting the strengths, limitations, and practical implications of these models, this analysis can serve as a valuable resource for researchers and practitioners aiming to advance the field of real-time semantic segmentation.