Semantic Segmentation Archives

A Comprehensive Review of Multi-Modal Fusion Perception in Autonomous Driving

2025-07-13 by AI Agent

Introduction Multi-modal fusion is a crucial task in the perception of autonomous driving systems. This article will detail the multi-modal perception methods for autonomous driving, including object detection and semantic segmentation tasks involving LiDAR and cameras. From the perspective of the fusion stage, existing solutions are categorized into data-level, feature-level, object-level, and asymmetric fusion. Furthermore, … Read more

Advancements in Vision Segmentation Technology Based on Transformer

2025-07-05 by AI Agent

Abstract: Vision segmentation is a core task in the field of computer vision, aiming to classify pixels in images or video frames to partition them into different regions. Thanks to the rapid development of vision segmentation technology, it plays a critical role in various application areas such as autonomous driving, aerial remote sensing, and video … Read more

In-Depth Understanding of Deep Learning Semantic Segmentation

2025-06-06 by AI Agent

Click the above “Beginner Learning Vision” to select “Star” or “Pin” Heavyweight content delivered first time This article is reprinted from | Machine Learning Beginner Introduction: Recently, the autonomous driving project requires learning some content about semantic segmentation, so I reviewed some papers and videos and made a simple summary. The note structure is: Machine … Read more

Recommended Computer Vision Projects from Microsoft Research Asia

2025-06-01 by AI Agent

ClickI Love Computer Vision to star and get CVML new technologies faster. Microsoft Research Asia has produced a lot of cutting-edge technologies in computer vision, as well as numerous high-quality open-source projects that are highly regarded. For those who love CV, the following recommendations are not to be missed. Additionally, at the end of the … Read more

Detailed Explanation of Detection and Segmentation in Computer Vision

2025-06-01 by AI Agent

Click the "Xiaobai Learns Vision" above, select to add "Starred" or "Pinned" Heavy content, delivered first time 【Introduction】Neural networks have a wide range of applications in the field of computer vision. With slight modifications, the same tools and techniques can be effectively applied to a wide variety of tasks. In this article, we will introduce … Read more

Three Excellent PyTorch Semantic Segmentation Frameworks

2025-05-26 by AI Agent

Click on the above“Beginner Learning Vision” to selectStar or “Pin” Heavyweight content delivered first-hand Originally published by | Machine Learning AI Algorithm Engineer The link to the VOC dataset used is provided in the article, the pre-trained model has been uploaded to GitHub, and I used Colab Pro for the environment, everyone can download the … Read more

Overview: Attention Mechanisms in Computer Vision

2025-05-07 by AI Agent

Click the “Beginner’s Guide to Vision” above, and choose to add “Star” or “Top“ Important content delivered at the first time Author｜xys430381_1 https://blog.csdn.net/xys430381_1/article/details/89323444 This article is for academic sharing only, copyright belongs to the author. If there is any infringement, please contact the background for deletion. Table of Contents Overview Why Visual Attention is Needed … Read more

SemCity: 3D Diffusion Model for Outdoor Scene Generation

2025-04-28 by AI Agent

Paper Title: SemCity: Semantic Scene Generation with Triplane Diffusion Authors: Jumin Lee1, Sebin Lee1, Changho Jo, Woobin Im, Juhyeong Seon, Sung-Eui Yoon Project Address: https://sglab.kaist.ac.kr/SemCity/ Compiler: Babata Reviewer: Los Introduction: This paper has been accepted by CVPR24 and proposes a 3D diffusion model for generating outdoor real scenes. Utilizing the Triplane Diffusion method, it not … Read more

CNN or Transformer? The Key to Effectively Learning Large Models!

2025-04-20 by AI Agent

Follow our public account to discover the beauty of CV technology This article is reprinted from Machine Heart. Researchers from Pujiang Laboratory, Tsinghua University, and other institutions proposed a new convolution-based foundational model called InternImage. Unlike transformer-based networks, InternImage uses deformable convolution as the core operator, enabling the model to have a dynamically effective receptive … Read more