Principles and Applications of OCR Technology

Principles and Applications of OCR Technology

Click on "Xiaobai Learning Vision" above, select to add "star" or "top" Heavy content delivered in real time Introduction Text is one of the most important sources of information for humans, and natural scenes are filled with various character symbols. OCR (Optical Character Recognition) is a familiar term, referring to the process where electronic devices … Read more

SegRefiner: High-Precision Image Segmentation via Diffusion

SegRefiner: High-Precision Image Segmentation via Diffusion

Follow our WeChat public account to discover the beauty of CV technology This article shares the NeruIPS 2023 paper SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process, which achieves high-precision image segmentation through diffusion. Details are as follows: Paper link: https://arxiv.org/abs/2312.12425 Open-source code: https://github.com/MengyuWang826/SegRefiner Background Introduction Although image segmentation has been widely researched and … Read more

Transformers in Computer Vision

Transformers in Computer Vision

This article is reprinted from AI Park. Author: Cheng He Translation: ronghuaiyang Introduction Applying Transformers to CV tasks is becoming increasingly common, and here we organize some related advancements for everyone. The Transformer architecture has achieved state-of-the-art results in many natural language processing tasks. One major breakthrough of the Transformer model may be the release … Read more

Overview of Transformer Small Object Detection

Overview of Transformer Small Object Detection

Click the above “Beginner’s Visual Learning” to choose to add “Star” or “Top” Heavyweight content delivered first time Transformers have rapidly gained popularity in the field of computer vision, particularly in object recognition and detection. After reviewing the results of state-of-the-art object detection methods, we noticed that Transformers outperform mature CNN-based detectors on almost every … Read more

CNN or Transformer? The Key to Effectively Learning Large Models!

CNN or Transformer? The Key to Effectively Learning Large Models!

Follow our public account to discover the beauty of CV technology This article is reprinted from Machine Heart. Researchers from Pujiang Laboratory, Tsinghua University, and other institutions proposed a new convolution-based foundational model called InternImage. Unlike transformer-based networks, InternImage uses deformable convolution as the core operator, enabling the model to have a dynamically effective receptive … Read more

Understanding 10+ Visual Transformer Models

Understanding 10+ Visual Transformer Models

Transformers, as an attention-based encoder-decoder architecture, have not only revolutionized the field of Natural Language Processing (NLP) but have also made groundbreaking contributions in the field of Computer Vision (CV). Compared to Convolutional Neural Networks (CNNs), Visual Transformers (ViT) rely on their excellent modeling capabilities, achieving outstanding performance across multiple benchmarks such as ImageNet, COCO, … Read more

Why Transformers Are Slowly Replacing CNNs in CV

Why Transformers Are Slowly Replacing CNNs in CV

Author: Pranoy Radhakrishnan Translator: wwl Proofreader: Wang Kehan This article is about 3000 words and is recommended to be read in 10 minutes. This article discusses the application of Transformer models in the field of computer vision and compares them with CNNs. Before understanding Transformers, consider why researchers are interested in studying Transformers when there … Read more

Exploring Transformers in Computer Vision

Exploring Transformers in Computer Vision

Original from AI Park Author: Cheng He Translated by: ronghuaiyang Introduction Applying Transformers to CV tasks is becoming increasingly common, and here are some related advancements for everyone. The Transformer architecture has achieved state-of-the-art results in many natural language processing tasks. A significant breakthrough for Transformer models may be the release of GPT-3 mid-year, which … Read more

Understanding CV Transformers: A Comprehensive Guide

Understanding CV Transformers: A Comprehensive Guide

Transformers, as an attention-based encoder-decoder architecture, have not only revolutionized the field of Natural Language Processing (NLP) but have also made groundbreaking contributions to the field of Computer Vision (CV). Compared to Convolutional Neural Networks (CNNs), Vision Transformers (ViT) rely on excellent modeling capabilities, achieving outstanding performance on several benchmarks including ImageNet, COCO, and ADE20k. … Read more

A Comprehensive Overview of Visual Transformers in CV: Status, Trends, and Future Directions

A Comprehensive Overview of Visual Transformers in CV: Status, Trends, and Future Directions

Source | Heart of Autonomous Driving Editor | Deep Blue Academy Abstract Transformers, an encoder-decoder model based on attention, have revolutionized the field of Natural Language Processing (NLP). Inspired by these significant achievements, recent pioneering work has adopted transformer-like architectures in the field of Computer Vision (CV), demonstrating their effectiveness in three fundamental CV tasks … Read more