Implementing Night Vision Imaging with Convolutional Neural Networks

The American company Owl Autonomous Imaging’s Thermal Ranger system can locate and classify targets such as pedestrians in the dark using only a thermal infrared camera and a trained Convolutional Neural Network (CNN). Thermal imaging is particularly suitable for imaging after dark, as it relies on the infrared energy emitted by objects themselves rather than relying on streetlights or headlights for illumination. Additionally, thermal infrared can relatively easily penetrate rain, fog, and other obstructions, providing useful images even in harsh conditions.

Night Vision Imaging

However, detection alone is insufficient to support decision-making—each imaged target needs to be classified by type and marked with its distance from the vehicle. The Owl system utilizes a complex CNN extracted from a single thermal image to gather all the information necessary for automated emergency braking decisions.

Detection, Recognition, Identification (DRI)

Figure 1 Typical requirements for detection, recognition, and identification. Classification lies between recognition and identification.

The images provided to the CNN must contain sufficient detail to extract the required information. The level of usable information in an image depends on the number of sampling points (pixels) of important objects. Figure 1 illustrates the three levels typically assigned based on conclusions drawn by human observers using video monitors—detection, recognition, and identification. When the observer is a CNN, the fourth category becomes crucial—classification—which lies between recognition and identification, equivalent to about ten pixels per meter. This resolution is sufficient for the CNN to differentiate between adults and children and determine the posture of pedestrians.

Basic Principles of CNN

Figure 2 CNN training images, with the objects to be classified indicated by color.

A Convolutional Neural Network is a computer simulation of neurons that can be trained to respond when they see objects strongly correlated with those labeled in a set of training images. Figure 2 shows a typical training image where the objects to be detected are indicated by color. The coloring is done manually by experienced technicians to ensure reliable image data during training.

When training begins, the CNN has little information, so its accuracy is low, and the failure rate (loss) is high. With additional training, the CNN’s accuracy improves, and the loss decreases. However, continuous repetition of training data may lead the CNN to only recognize new images that are very similar to the training images. This phenomenon is known as overfitting and is caused by the CNN being shown the training images multiple times. Attempting to make the CNN perform perfectly during training almost always results in additional loss on new images. Therefore, when the CNN is operating close to optimal, training must be halted.

Training the Thermal Ranger System

The Thermal Ranger system requires two types of scene information to formulate its report on scene content: the classification of each imaged target and its position in the image, as well as the distance from the camera to each classified object.

To initiate this process, the CNN generates a map of the entire image, extracting a value known as “inverse depth” at each pixel, which is inversely proportional to the distance of the imaged object from that pixel. To ensure accurate distance reporting in the final results, the CNN is trained on images containing significant content within a useful distance range where the true distance to that content is known.

Another CNN is trained on images of the objects to be classified so that bounding boxes can be placed around them to assign positions in the image. The contents of the bounding boxes can then be segmented into pixels covering the object and pixels displaying the background.

The outputs of the two CNNs are then combined to assign distances to pixels that only represent the object. The result is a report for each object that includes its class, position in the image, and distance from the camera.

Utilizing thermal images, the CNN can provide crucial information for automated braking systems and drivers, helping to reduce pedestrian accidents.

This article is reproduced from the “Jinhang Optoelectronics” WeChat public account

(Xi’an Institute of Applied Optics Ma Huaixu Editor)

New Optoelectronics

Long press the QR code to follow us

Welcome to reprint, please indicate the source!

Welcome to follow the “Applied Optics” WeChat public account

Welcome to reprint, please indicate the source!

Leave a Comment Cancel reply