Comprehensive Overview of Data Augmentation Techniques in Computer Vision

Click the “CVer” above and select “Star” to pin it

Heavyweight content delivered firstComprehensive Overview of Data Augmentation Techniques in Computer Vision

If we were to rank several stages in the deep learning development process by importance, preparing training data would surely be among the top few. It’s important to understand that once a model network is written, it is merely a chunk of code and has little intelligence; it only learns how to make inferences through exposure to large amounts of data. Therefore, training data is quite similar to a special secret manual in martial arts novels; before learning, one is a novice, but after learning, they can dominate the world!
Comprehensive Overview of Data Augmentation Techniques in Computer Vision
Unfortunately, training data shares another characteristic with the secret manual: it is hard to come by! In other words, it is difficult to obtain. Besides public datasets, if users want to prepare data based on their business scenarios, not only is the production and labeling process complex, but the required scale is typically enormous. Only with sufficient data can the model training be effective, leading to high costs in dataset creation. This is even more pronounced in the field of computer vision, as images must be captured and labeled one by one. Just thinking about gathering hundreds of thousands of images is enough to make one shudder!
To address the above issues, image data augmentation is a commonly used solution in the field of computer vision, often applied in scenarios with insufficient data or many model parameters. If users have limited data, they can use data augmentation techniques to expand their datasets. In some common image classification tasks, such as the ImageNet classification of a thousand objects, standard data augmentation methods, including random cropping and flipping, are employed during preprocessing. In addition to these standard methods, PaddleClas, a suite for image classification, also supports eight additional data augmentation methods, which will be explained one by one below.

All the code below is from PaddleClas:

GitHub link:

https://github.com/PaddlePaddle/PaddleClas

Gitee link:

https://gitee.com/paddlepaddle/PaddleClas

Eight Major Data Augmentation Methods

First, let’s take a look at the standard data augmentation methods represented by the ImageNet image classification task. The operational process can be divided into the following steps:

  1. Image decoding, which converts the image into Numpy format, abbreviated as ImageDecode.
  2. Image random cropping, randomly cropping the width and height of the image to 224, abbreviated as RandCrop.
  3. Random horizontal flipping, abbreviated as RandFlip.
  4. Normalization of image data, abbreviated as Normalize.
  5. Rearranging image data. The format of the image data is [H, W, C] (height, width, and channel), while the training data format used by neural networks is [C, H, W]. Therefore, the image data needs to be rearranged, for example, [224, 224, 3] becomes [3, 224, 224], abbreviated as Transpose.
  6. Batching multiple images into batch data, such as combining BatchSize images of [3, 224, 224] into [batch-size, 3, 224, 224], abbreviated as Batch.
Compared to the standard image augmentation methods mentioned above, researchers have also proposed many improved image augmentation strategies. These strategies insert certain operations at different stages of the standard augmentation methods, and can be broadly classified into three categories based on the stage of operation:
  1. Image transformation: Performing some transformations on the 224 images after RandCrop, including AutoAugment and RandAugment.
  2. Image cropping: Performing some cropping on the 224 images after Transpose, including CutOut, RandErasing, HideAndSeek, and GridMask.
  3. Image mixing: Mixing or overlaying data after Batch, including Mixup and Cutmix.
PaddleClas integrates all the above data augmentation strategies, and the reference papers and open-source code for each strategy are listed in the following introductions. This document will introduce the principles and usage of these strategies, and visualize the effects of the transformations with the following images.
Comprehensive Overview of Data Augmentation Techniques in Computer Vision

Image Transformation

By combining some sub-strategies of image augmentation to modify and jump images, these sub-strategies include brightness transformation, contrast enhancement, sharpening, etc. Based on the different rules for combining strategies, they can be divided into two methods: AutoAugment and RandAugment.

01AutoAugment

Paper link:

https://arxiv.org/abs/1805.09501v1

Unlike conventional manually designed image augmentation methods, AutoAugment finds and combines suitable image augmentation schemes for specific datasets through search algorithms within a series of image augmentation sub-strategies. For the ImageNet dataset, the final searched data augmentation scheme contains 25 combinations of sub-strategies, each containing two transformations. A sub-strategy combination is randomly selected for each image, and a certain probability determines whether to execute each transformation in the sub-strategy.
The usage of AutoAugment in PaddleClas is as follows:
from ppcls.data.imaug import DecodeImage
from ppcls.data.imaug import ResizeImage
from ppcls.data.imaug import ImageNetPolicy
from ppcls.data.imaug import transform

size = 224
# Image decoding
decode_op = DecodeImage()
# Image random cropping
resize_op = ResizeImage(size=(size, size))
# Using AutoAugment image augmentation method
autoaugment_op = ImageNetPolicy()

ops = [decode_op, resize_op, autoaugment_op]
# Image path
imgs_dir = "/imgdir/xxx.jpg"  
fnames = os.listdir(imgs_dir)
for f in fnames:
    data = open(os.path.join(imgs_dir, f)).read()
    img = transform(data, ops)
The transformation results are shown in the following images.
Comprehensive Overview of Data Augmentation Techniques in Computer Vision

02

RandAugment

Paper link:
https://arxiv.org/pdf/1909.13719.pdf
AutoAugment’s search method is quite brute-force, directly searching for the optimal strategy for the dataset, which can be computationally expensive. In the corresponding paper for RandAugment, the authors found that for larger models and datasets, the benefits from the augmentation strategies found using AutoAugment decrease; furthermore, the optimal strategies found are specific to the designated dataset and have poor transferability to other datasets.
In RandAugment, the authors proposed a random augmentation method where, unlike AutoAugment, where specific probabilities determine whether to use a sub-strategy, all sub-strategies are selected with the same probability. Experiments in the paper also indicate that this data augmentation method performs well even in training large models.
The usage of RandAugment in PaddleClas is as follows:
from ppcls.data.imaug import DecodeImage
from ppcls.data.imaug import ResizeImage
from ppcls.data.imaug import RandAugment
from ppcls.data.imaug import transform

size = 224
# Image decoding
decode_op = DecodeImage()
# Image random cropping
resize_op = ResizeImage(size=(size, size))
# Using RandAugment image augmentation method
randaugment_op = RandAugment()

ops = [decode_op, resize_op, randaugment_op]
# Image path
imgs_dir = "/imgdir/xxx.jpg"  
fnames = os.listdir(imgs_dir)
for f in fnames:
    data = open(os.path.join(imgs_dir, f)).read()
    img = transform(data, ops)
The transformation results are shown in the following images.
Comprehensive Overview of Data Augmentation Techniques in Computer Vision

Image Cropping

The image cropping class mainly involves cropping parts of the images after Transpose, which can be understood as covering parts of the images, with four methods: CutOut, RandErasing, HideAndSeek, and GridMask.

03

Cutout

Paper link:

https://arxiv.org/abs/1708.04552

Cutout can be understood as an extension of Dropout; the difference is that Dropout masks the features generated after the image passes through the network, while Cutout directly masks the input image. This method is more robust to noise compared to Dropout. The authors explain in the paper that this method has two advantages:
  • Cutout can simulate classification scenarios where the subject is partially covered in real-world situations.
  • It encourages the model to utilize more content in the image for classification, preventing the network from focusing only on prominent areas, thus avoiding overfitting.
The usage of Cutout in PaddleClas is as follows:
from ppcls.data.imaug import DecodeImage
from ppcls.data.imaug import ResizeImage
from ppcls.data.imaug import Cutout
from ppcls.data.imaug import transform

size = 224
# Image decoding
decode_op = DecodeImage()
# Image random cropping
resize_op = ResizeImage(size=(size, size))
# Using Cutout image augmentation method
cutout_op = Cutout(n_holes=1, length=112)

ops = [decode_op, resize_op, cutout_op]
# Image path
imgs_dir = "/imgdir/xxx.jpg"  
fnames = os.listdir(imgs_dir)
for f in fnames:
    data = open(os.path.join(imgs_dir, f)).read()
    img = transform(data, ops)
The cropping results are shown in the following images:
Comprehensive Overview of Data Augmentation Techniques in Computer Vision

04

RandomErasing

Paper link:

https://arxiv.org/pdf/1708.04896.pdf

RandomErasing is similar to the Cutout method, also aimed at addressing the poor generalization of models trained on occluded data. The authors also indicate that random cropping and random horizontal flipping have a complementary relationship. The method’s effectiveness has been verified in pedestrian re-identification (REID). Unlike Cutout, in RandomErasing, images are subjected to this preprocessing method with a certain probability, and the size and aspect ratio of the generated mask are randomly determined based on preset hyperparameters.
The usage of RandomErasing in PaddleClas is as follows:
from ppcls.data.imaug import DecodeImage
from ppcls.data.imaug import ResizeImage
from ppcls.data.imaug import ToCHWImage
from ppcls.data.imaug import RandomErasing
from ppcls.data.imaug import transform

size = 224
# Image decoding
decode_op = DecodeImage()
# Image random cropping
resize_op = ResizeImage(size=(size, size))
# Using RandomErasing image augmentation method
randomerasing_op = RandomErasing()

ops = [decode_op, resize_op, tochw_op, randomerasing_op]
# Image path
imgs_dir = "/imgdir/xxx.jpg"  
fnames = os.listdir(imgs_dir)
for f in fnames:
    data = open(os.path.join(imgs_dir, f)).read()
    img = transform(data, ops)
    img = img.transpose((1, 2, 0))
The cropping results are shown in the following images.
Comprehensive Overview of Data Augmentation Techniques in Computer Vision

05

HideAndSeek

Paper link:

https://arxiv.org/pdf/1811.02545.pdf

The HideAndSeek method divides the image into several equal-sized patches, for each patch, a mask is generated with a certain probability, as shown in the image below, which may be completely occluded, not occluded at all, or partially occluded.
Comprehensive Overview of Data Augmentation Techniques in Computer Vision
The usage of HideAndSeek in PaddleClas is as follows:
from ppcls.data.imaug import DecodeImage
from ppcls.data.imaug import ResizeImage
from ppcls.data.imaug import ToCHWImage
from ppcls.data.imaug import HideAndSeek
from ppcls.data.imaug import transform

size = 224
# Image decoding
decode_op = DecodeImage()
# Image random cropping
resize_op = ResizeImage(size=(size, size))
# Using HideAndSeek image augmentation method
hide_and_seek_op = HideAndSeek()

ops = [decode_op, resize_op, tochw_op, hide_and_seek_op]
# Image path
imgs_dir = "/imgdir/xxx.jpg"  
fnames = os.listdir(imgs_dir)
for f in fnames:
    data = open(os.path.join(imgs_dir, f)).read()
    img = transform(data, ops)
    img = img.transpose((1, 2, 0))
The cropping results are shown in the following images.
Comprehensive Overview of Data Augmentation Techniques in Computer Vision

06

GridMask

Paper link:

https://arxiv.org/abs/2001.04086

The authors indicate in the paper that previous image cropping methods have two issues:
  • Excessive removal of regions may cause the main subject to be mostly or entirely removed, or lead to the loss of contextual information, resulting in augmented data becoming noise data;
  • Retaining too many areas may have little effect on the main subject and context, losing the significance of augmentation.
Comprehensive Overview of Data Augmentation Techniques in Computer Vision
Thus, avoiding excessive removal or retention becomes the core issue to solve. GridMask generates a mask of the same resolution as the original image, randomly flips the mask, and multiplies it with the original image to obtain the augmented image, controlling the size of the generated mask grid through hyperparameters.
During training, there are two usage methods:
  • Set a probability p and use GridMask for augmentation with probability p from the start of training.
  • Initially set the augmentation probability to 0, and as the number of iterations increases, gradually increase the probability of applying GridMask augmentation to the training images until it reaches p.
The paper states that the second method yields better training results after verification.
The usage of GridMask in PaddleClas is as follows:
from data.imaug import DecodeImage
from data.imaug import ResizeImage
from data.imaug import ToCHWImage
from data.imaug import GridMask
from data.imaug import transform

size = 224
# Image decoding
decode_op = DecodeImage()
# Image random cropping
resize_op = ResizeImage(size=(size, size))
# Image data rearrangement
tochw_op = ToCHWImage()
# Using GridMask image augmentation method
gridmask_op = GridMask(d1=96, d2=224, rotate=1, ratio=0.6, mode=1, prob=0.8)

ops = [decode_op, resize_op, tochw_op, gridmask_op]
# Image path
imgs_dir = "/imgdir/xxx.jpg" 
fnames = os.listdir(imgs_dir)
for f in fnames:
    data = open(os.path.join(imgs_dir, f)).read()
    img = transform(data, ops)
    img = img.transpose((1, 2, 0))
The results are shown in the following images:
Comprehensive Overview of Data Augmentation Techniques in Computer Vision

Image Mixing

The image transformations and cropping methods mentioned earlier are operations performed on single images, whereas image mixing involves merging two images to generate one. The main difference between Mixup and Cutmix lies in how the mixing is done.

07

Mixup

Paper link:
https://arxiv.org/pdf/1710.09412.pdf
Mixup is the first proposed image mixing augmentation scheme, which directly adds the pixels of two images with a random ratio. It is not only simple but also easy to implement, achieving good results in both image classification and object detection. For ease of implementation, mixing is typically performed only within a batch of data, and this is also the case in Cutmix.
The usage of Mixup in PaddleClas is as follows:
from ppcls.data.imaug import DecodeImage
from ppcls.data.imaug import ResizeImage
from ppcls.data.imaug import ToCHWImage
from ppcls.data.imaug import transform
from ppcls.data.imaug import MixupOperator

size = 224
# Image decoding
decode_op = DecodeImage()
# Image random cropping
resize_op = ResizeImage(size=(size, size))
# Image data rearrangement
tochw_op = ToCHWImage()
# Using HideAndSeek image augmentation method
hide_and_seek_op = HideAndSeek()
# Using Mixup image augmentation method
mixup_op = MixupOperator()

ops = [decode_op, resize_op, tochw_op]

imgs_dir = "/imgdir/xxx.jpg"  # Image path
batch = []
fnames = os.listdir(imgs_dir)
for idx, f in enumerate(fnames):
    data = open(os.path.join(imgs_dir, f)).read()
    img = transform(data, ops)
    batch.append( (img, idx) ) # fake label

new_batch = mixup_op(batch)
The mixing results are shown in the following images.
Comprehensive Overview of Data Augmentation Techniques in Computer Vision

08

Cutmix

Paper link:

https://arxiv.org/pdf/1905.04899v2.pdf

Unlike Mixup, which directly adds two images together, Cutmix randomly crops a region of interest (ROI) from another image and overlays it onto the corresponding area of the current image. The code implementation is as follows:
from ppcls.data.imaug import DecodeImage
from ppcls.data.imaug import ResizeImage
from ppcls.data.imaug import ToCHWImage
from ppcls.data.imaug import transform
from ppcls.data.imaug import CutmixOperator

size = 224

# Image decoding
decode_op = DecodeImage()
# Image random cropping
resize_op = ResizeImage(size=(size, size))
# Image data rearrangement
tochw_op = ToCHWImage()
# Using HideAndSeek image augmentation method
hide_and_seek_op = HideAndSeek()
# Using Cutmix image augmentation method
cutmix_op = CutmixOperator()

ops = [decode_op, resize_op, tochw_op]

imgs_dir = "/imgdir/xxx.jpg"  # Image path

batch = []
fnames = os.listdir(imgs_dir)
for idx, f in enumerate(fnames):
    data = open(os.path.join(imgs_dir, f)).read()
    img = transform(data, ops)
    batch.append( (img, idx) ) # fake label

new_batch = cutmix_op(batch)
The mixing results are shown in the following images.
Comprehensive Overview of Data Augmentation Techniques in Computer Vision

Experiments

Experimental verification shows that the classification accuracy using different data augmentation methods based on PaddleClas on the ImageNet1k dataset is as follows, indicating that data augmentation methods can effectively improve model accuracy.

Comprehensive Overview of Data Augmentation Techniques in Computer Vision
Note:
In this experiment, to facilitate comparison, the l2 decay was fixed at 1e-4. In actual use, a smaller l2 decay generally yields better results. Combining data augmentation, reducing l2 decay from 1e-4 to 7e-5 can bring at least a 0.3-0.5% accuracy improvement.

PaddleClas Data Augmentation Pitfalls

Guide and Some Precautions

Finally, here are some small tricks for using data augmentation in PaddleClas:

  • When using image mixing data processing, you need to set use_mix to True in the configuration file. Additionally, since labels need to be mixed during image mixing, the training accuracy cannot be calculated, so the training accuracy is not printed during training.
  • After using data augmentation, the training data becomes more challenging, so the training loss function may be larger, and the accuracy of the training set may be relatively low, but it has better generalization ability, leading to higher validation set accuracy.
  • After using data augmentation, the model may tend to underfit. It is recommended to appropriately reduce the value of l2_decay to achieve higher validation set accuracy.
  • Almost every type of image augmentation contains hyperparameters. PaddleClas only provides hyperparameters based on ImageNet-1k; users need to adjust hyperparameters for other datasets. If you are unclear about the meaning of hyperparameters, you can read the relevant papers, and the methods for tuning can be referenced in the training tips section (https://github.com/PaddlePaddle/PaddleClas/blob/master/docs/zh_CN/models/Tricks.md).
If you encounter any issues during use, you can join the official PaddlePaddle QQ group for communication: 1108045677.
If you want to learn more about PaddlePaddle, please refer to the following documents.
·PaddlePaddle Official Website·
https://www.paddlepaddle.org.cn/
·PaddlePaddle Open Source Framework Project Address·
GitHub:
https://github.com/PaddlePaddle/Paddle
Gitee:
https://gitee.com/paddlepaddle/Paddle
·PaddleClas Project Address·
GitHub:
https://github.com/PaddlePaddle/PaddleClas
Gitee:
https://gitee.com/paddlepaddle/PaddleClas

Comprehensive Overview of Data Augmentation Techniques in Computer Vision

Comprehensive Overview of Data Augmentation Techniques in Computer Vision

Comprehensive Overview of Data Augmentation Techniques in Computer Vision

Comprehensive Overview of Data Augmentation Techniques in Computer Vision

Comprehensive Overview of Data Augmentation Techniques in Computer Vision

END

Exciting Event Recommendations

Comprehensive Overview of Data Augmentation Techniques in Computer Vision

Leave a Comment