In today’s digital age, Convolutional Neural Networks (CNNs), as a subset of Deep Learning (DL), are widely used in various computer vision tasks such as image classification, object detection, and image segmentation. Many types of CNNs have been designed to meet specific needs and requirements, including one-dimensional (1D), two-dimensional (2D), and three-dimensional (3D) CNNs, as well as dilated convolutions, grouped convolutions, attention convolutions, depthwise convolutions, and NAS, among others. Each type of CNN has its unique structure and characteristics, making it suitable for specific tasks. It is crucial to deeply understand and conduct comparative analyses of these different types of CNNs to comprehend their advantages and disadvantages. Furthermore, studying the performance, limitations, and practical applications of each type of CNN can assist in the future development of new and improved architectures. We also delve into the platforms and frameworks utilized by researchers for their studies or developments from various perspectives.

https://arxiv.org/abs/2403.13001

Additionally, we explore the main research areas of CNNs, such as 6D vision, generative models, and meta-learning. This review paper provides a comprehensive examination and comparison of various CNN architectures, highlighting their architectural differences and emphasizing their respective advantages, disadvantages, applications, challenges, and future trends.

In today’s world, with the continuous advancement of technology, Deep Learning (DL) has become an inseparable part of our lives【1】. From voice assistants like Siri and Alexa to personalized recommendations on social media platforms, DL algorithms are continuously working behind the scenes to understand our preferences, making our lives more convenient【2】. As technology evolves, DL is also being applied in various fields such as healthcare, finance, and transportation, fundamentally changing the way we handle these industries【3】-【5】. With ongoing research and development in the field of DL, we can expect more innovative applications to emerge, further enhancing our daily lives. DL has ushered in a transformative era in artificial intelligence, enabling machines to absorb vast datasets and make informed predictions【6】【8】. Among the significant advancements in deep learning, the development of Convolutional Neural Networks (CNNs) has garnered attention. Their impact is already evident in fields such as generative AI, medical image examination, object recognition【9】, and anomaly detection【10】. As a type of feedforward neural network, CNNs integrate convolution operations into their architecture【7】【11】. These operations enable CNNs to adeptly capture complex spatial and hierarchical patterns, making them highly suitable for image analysis tasks【12】.

However, CNNs often face burdens due to their computational complexity during training and deployment, especially on resource-constrained devices like mobile phones and wearables【12】【13】.

To enhance the energy efficiency of CNNs, two main approaches have emerged: Adopting lightweight CNN architectures: These architectures are meticulously designed to achieve computational efficiency without compromising accuracy. For instance, the MobileNet series of CNNs is specifically designed for mobile devices and has demonstrated state-of-the-art accuracy across various image classification applications【13】. Utilizing compression techniques: These methods help reduce the size of CNN models, thereby minimizing the amount of data transferred between devices. A notable example is the TensorFlow Lite framework, which provides a set of compression techniques specifically for compressing CNN models on mobile devices【14】.

The combination of lightweight CNN architectures and compression techniques significantly improves the energy efficiency of CNNs. This makes it possible to train and deploy CNNs on resource-constrained devices, thereby opening new opportunities for the use of CNNs in various applications such as healthcare, agriculture, and environmental monitoring【12】【16】.

How different convolution techniques adapt to various AI applications. Convolutions play a fundamental role in modern DL architectures, especially when handling grid-structured data such as images, audio signals, and sequential data, which is critical【23】. Convolution operations involve moving a small filter (also known as a kernel) over the input data, performing element-wise multiplication and aggregation. This process extracts key features from the input data【24】. The primary significance of convolutions lies in their ability to effectively capture local patterns and spatial relationships within the data. This locality property makes convolutions particularly suitable for tasks like image recognition, as objects can be identified based on their local structure. Furthermore, convolutions introduce parameter sharing, leading to a significant reduction in the number of trainable parameters, resulting in a more efficient and scalable model【25】. Existing reviews: Previous review papers such as【118】 and【120】 provide a good overview of popular architectures during a certain period. However, they lack clear research questions and objectives, assessing and discussing challenges based on design patterns. They mainly discuss architectures in chronological order.

Previous works have discussed the challenges of CNNs in certain specific concepts and applications, but have not broadly covered the intrinsic taxonomy present in novel CNN architectures. Therefore, we wrote this review paper aiming to address the gaps in previous works by proposing a taxonomy that clearly categorizes CNN architectures based on their intrinsic design patterns rather than the year of publication.

We focus on architectural innovations post-2012 and discuss recent developments in greater depth than earlier reviews. Discussing the latest trends and challenges provides researchers with updated perspectives.

This comprehensive review paper aims to accelerate research progress in this field, covering the history, taxonomy, applications, and challenges of CNNs.

The key questions we seek to address in this paper include:

How do the latest CNN models like ResNet, Inception, and MobileNet perform on target hardware compared to constrained baselines? What impacts do they have on accuracy, latency, and memory usage?
How do techniques such as pruning, quantization, distillation, and architecture design minimize model size and computational complexity while maintaining prediction quality?
How do multi-stage optimization methods that combine different techniques compare to single methods? Can we achieve a better trade-off between accuracy, latency, and memory?
What are the best practices for benchmarking, tuning, and deploying optimized CNN models for target applications like embedded vision, considering their unique constraints and specifications?
Which pruning and quantization techniques are best suited for our target applications and hardware? How does this compare to the baseline?

Our review makes several key contributions to the Deep Learning (DL) and Computer Vision (CV) communities:

Analyzing a variety of existing CNNs: This review provides a comprehensive and detailed analysis of various DL models and algorithms used in computer vision applications.
Comparing CNN models across various parameters and architectures: The overview provides insights into performance and efficiency trade-offs.
Identifying the strengths and weaknesses of different CNN models: Assisting researchers in selecting the most suitable model for their specific applications.
The overview highlights the challenges and future directions for further improvements in the DL and computer vision fields.
Exploring trends in neural network architectures: This emphasizes the practical applications and exciting nature of advancements.
Providing a comprehensive overview of major research areas: This covers the main research areas actively pursued by researchers.

The remainder of our review paper is as follows (see Figure 1): Section 2 will delve into the fundamentals of convolutions, elucidating their mathematical formulas, operational mechanisms, and their roles in neural network architectures. Section 3 describes the basic components of CNNs. In Section 4, the exploration will cover 2D convolutions, 1D convolutions for sequential data, and 3D convolutions for volumetric data. Section 5 will investigate advanced convolution techniques that have emerged in recent years. This will include topics such as transposed convolutions for upsampling, depthwise separable convolutions for efficiency, spatial pyramid pooling, and attention mechanisms within convolutions. Section 6 will highlight real-world applications of different convolution types, demonstrating their practicality in image recognition, object detection, natural language processing, audio processing, and medical image analysis. In Section 7, we will discuss future trends and some open questions regarding CNNs. Section 8 will address performance considerations for CNNs. In Section 9, we will discuss the platforms most commonly used by researchers and developers, followed by a discussion of popular or trending research areas in Section 10, and finally, we will conclude in Section 11. Through Section 8 of this study, readers will gain a profound understanding of the importance of convolutions in DL, with Figure 2 representing a reader map to visualize the flow of information within the text. It illustrates the connections between various sections, helping readers understand the overall structure based on their preferred sections.

Initial Review: Zhang Yanling

Second Review: Song Qifan

Final Review: Jin Jun

Previous Recommendations

News

○ Announcement from the Zhejiang Provincial Department of Natural Resources regarding the 2024 Public Recruitment of Personnel for Certain Affiliated Institutions

○ Talent Recruitment | Global Recruitment for the Yixian Postdoctoral Program at the School of Surveying and Mapping Science and Technology, Sun Yat-sen University!

○ The Ministry of Natural Resources Releases 6 Industry Standards

○ Recommended Article from Researcher Li Zishen of the Aerospace Information Innovation Research Institute, Chinese Academy of Sciences | National Report on Geodesy in China (2019-2023) Special Issue of the Journal of Surveying and Mapping (English Edition)

○ Directory of Volume 1, 2024 of the Journal of Surveying and Mapping

Overview of Convolutions in Deep Learning: Applications, Challenges, and Future Trends

Initial Review: Zhang Yanling

Second Review: Song Qifan

Final Review: Jin Jun

○ Announcement from the Zhejiang Provincial Department of Natural Resources regarding the 2024 Public Recruitment of Personnel for Certain Affiliated Institutions

Leave a Comment Cancel reply

Initial Review: Zhang Yanling Second Review: Song Qifan Final Review: Jin Jun

○ Announcement from the Zhejiang Provincial Department of Natural Resources regarding the 2024 Public Recruitment of Personnel for Certain Affiliated Institutions

Leave a Comment Cancel reply

Initial Review: Zhang Yanling

Second Review: Song Qifan

Final Review: Jin Jun