1 Compiled by New Intelligence
Source: blog.kaggle.com
Compiled by: Jia Yuepeng
[New Intelligence Guide]The champion team of the Kaggle Ocean Fish Recognition and Classification Competition shares their technology: How to design robust optimization algorithms? How to analyze data and perform data augmentation? Technical details include using images from different boats for validation and how to handle night vision images.
This year, the Kaggle community hosted the Nature Conservancy Fisheries Monitoring competition, calling for participants to develop algorithms capable of automatically detecting and classifying marine species caught by fishing boats.
Illegal fishing poses a threat to marine ecosystems. These algorithms will help enhance the ability of the Nature Conservancy to analyze data from camera monitoring systems. In the following winner interview, the champion team “Towards Robust-Optimal Learning of Learning” (Gediminas Pekšys, Ignas Namajūnas, Jonas Bialopetravičius) shares the technical details of their algorithms, such as how to use images from different boats for validation and how to handle night vision images.
Since the photos in the competition dataset cannot be made public, the team hired graphic designer Jurgita Avišansytė to create illustrations for this blog post.
What was your background before entering this challenge?
P: Graduated in Mathematics from Cambridge, worked as a data scientist/consultant for about 2 years, and about 1.5 years as a software engineer, with approximately 1.5 years of experience in object detection research and framework development as a monitoring application research engineer.
N: Bachelor’s in Mathematics, Master’s in Computer Science, and 3 years of R&D work, with 9 months of experience as the lead researcher in a monitoring project.
B: Bachelor’s in Software Engineering, Master’s in Computer Science, 6 years of professional experience in computer vision and machine learning, currently researching astrophysics, with a strong interest in applying deep learning methods.
What previous experience or domain knowledge helped you succeed in this competition?
P: The work and research experience I gained from my last Kaggle competition helped me in this competition, particularly in establishing a reasonable validation method within the first week.
N: My university studies (mainly self-taught), R&D work experience, the experience of the previous two Kaggle computer vision competitions, and reading arXiv papers daily.
B: My master’s thesis was on deep learning, and I also have some Kaggle competition experience. I regularly solve computer vision problems at work.
How did you start participating in Kaggle competitions?
P: I first heard about Kaggle in my first year as a data scientist, but I didn’t consider competing until a few years later when I shifted to computer vision. Kaggle competitions allow you to focus on slightly different problems/datasets and effectively validate different methods.
N: I used to enjoy participating in competitions like ACM ICPC. I didn’t achieve particularly noteworthy accomplishments, but participating as a member of the Vilnius University team in international competitions was the best experience of my student life. After I started working in machine learning and computer vision, I fell in love with long-term challenges, so Kaggle was a perfect fit.
B: I enjoy solving machine learning problems, and Kaggle is the platform for that.
What made you decide to participate?
P: I wanted to experiment more with stacking and customizing models for computer image detection and classification. I also wanted to compare recent detection frameworks/architectures.
N: Object detection is one of my strengths, and this problem looked challenging due to the high degree of imaging conditions “in the wild”.
B: Mainly because this competition looked very challenging, especially with the lack of good data.
Did you borrow any methods from previous research or competitions?
We borrowed from Faster R-CNN, which performed well in previous competitions, and we have experience using and modifying it.
What supervised learning methods did you use?
We mainly used Faster R-CNN with VGG-16 as the feature extractor, with one model using R-FCN with ResNet-101.
How did you perform data preprocessing and data augmentation?
The augmentation pipeline used for training models was quite standard. We applied random rotation, horizontal flipping, blurring, and scaling changes, all of which improved validation scores. However, the two most important things were using night vision images and the color of the images.
We noticed early on that night vision images were really easy to identify—just check if the average value of the green channel is brighter than the combined average of the red and blue channels, with a weighting factor of 0.75, applicable in all cases. Observing the color intensity histogram of typical normal images and night vision images clearly shows the differences, as the color distribution of regular images is usually close to each other, as can be seen in the figure below. The dashed line represents the best-fit Gaussian approximating these distributions.
We wanted to increase the number of night vision images. Therefore, the final model, which was also the best-performing single model, randomly assigned some training images and expanded the histogram to make it closer to night vision images. This was done separately for each color channel and assumed to be Gaussian (though it is not actually Gaussian), and modified the mean and standard deviation accordingly—basically shrinking the red and blue channels, as shown in the figure. We also applied random contrast stretching to each color channel separately, as night vision images can be very diverse, and fixed transformations cannot capture this variation.
Since this model performed very well, we also added a model that did not use night vision images alone but extended the contrast of all images. Since this was done separately on each channel, it could change the color of the fish or the surrounding environment. Given that the lighting conditions in the ocean vary greatly, the colors in real images are not very stable, so this method still yielded good results.
What is your most important takeaway regarding data in the competition?
First, it is essential to have a validation set containing images from different ships, rather than images from the training set, otherwise the model can learn to classify fish based on the features of the ships, which may not be reflected in the validation score but could lead to a drop in accuracy on the stage 2 test set.
Secondly, the sizes of the fish in the entire dataset are very different, so it is clearly useful to address this.
Third, there are a lot of night vision images with different color distributions, so handling night vision images differently also improved our scores.
More importantly, additional data published by other teams on the forum seemed to contain many such images, where the fish looked different from those placed on the boats, so it was crucial to filter out this portion of the data.
Finally, we performed polygon annotations on the original training images, which helped us achieve more accurate bounding boxes on rotated images; otherwise, the images would contain a lot of background (if the bounding box of the rotated frame was considered as basically real).
What tools did you use?
We used code from a custom repository py-R-FCN (including Faster R-CNN): https://github.com/Orpine/py-R-FCN.
What did you do in this competition?
We spent some time annotating data, finding useful additional data from images posted on the forum, finding the right augmentations to train the model, reviewing the generated validation image predictions, and checking for any false patterns the model might have learned.
What was your hardware setup?
Two NVIDIA GTX 1080s, one NVIDIA TITAN X
What were the training and prediction run times for your winning solution?
Very rough estimates: training on GTX 1080 took about 50 hours, predicting each image took 7-10 seconds. Our best single model was actually more accurate than the entire system, could be trained in 4 hours, and required 0.5 seconds for prediction.
Do you have any advice for those just starting in data science?
Start by reading introductory materials, then gradually begin reading papers, try to solve machine learning problems hands-on to develop intuition, check trained models, and strive to understand what went wrong. Computer vision problems are quite good for practice. Learning to enjoy the machine learning process requires long-term effort, and maintaining interest is key to staying motivated. Kaggle is the perfect platform for learning machine learning.
Original compilation: http://blog.kaggle.com/2017/07/07/the-nature-conservancy-fisheries-monitoring-competition-1st-place-winners-interview-team-towards-robust-optimal-learning-of-learning/
Click to read the original text for New Intelligence recruitment information