Source: Sensor Technology
Author: Mao Fuli
In today’s rapidly evolving science and technology, people’s demands for the intelligence and autonomy of machinery and equipment are also increasing. They hope that machines can fully replace human roles, freeing people from heavy and dangerous tasks. The ability to perceive the surrounding environment like humans has become the key to achieving intelligent autonomy in devices.

The broad definition of “image tracking” technology refers to locating objects captured by a camera through some means (such as image recognition, infrared, ultrasonic, etc.) and directing the camera to track that object, keeping it within the camera’s field of view. The narrow definition of “image tracking” technology is what we commonly discuss in daily life, which is tracking and capturing through “image recognition”.
Methods such as infrared and ultrasonic are affected by the environment and require specialized recognition auxiliary equipment, which have gradually been replaced by “image recognition” technology in practical applications. “Image recognition” directly utilizes the images captured by the camera to perform NCAST image differential and clustering operations to identify the location of the target object and direct the camera to track that object.
The image tracking system employs a unique NCAST target shape feature detection method, requiring no auxiliary equipment for the tracked object. As long as it enters the tracking area, the system can lock and track the target, centering the camera view on the locked target and controlling the camera to implement corresponding scaling strategies. The system supports various customizable strategies and multi-level close-up modes, exhibiting strong adaptability and is unaffected by strong light, sound, electromagnetic, and other environmental factors.
Edge Detection of Target Objects
The shape features of objects generally do not change much. Compared to area-based matching methods, tracking methods based on the target shape contour can segment the target more accurately.
Edges are the most basic features of moving targets, represented in images as pixel sets where there are abrupt changes in image grayscale or rooftop changes around the target. They are the most significant parts of local brightness changes in the image.

Edge detection uses certain algorithms to locate positions with discontinuous grayscale changes, thereby identifying the boundary line between the target and the background in the image. The grayscale variation of the image can be represented by the grayscale gradient.
Gradient Operator
The gradient is the first derivative, and the gradient operator corresponds to the first derivative operator. The first derivative, or gradient, can be represented as:

After edge detection, the image must undergo contour tracking and contour expression. The purpose of contour tracking is to obtain the set of edge pixels. Contour expression involves processing the edge table through fitting, statistics, and approximation to obtain an intuitive representation of the target shape features, providing template information for subsequent matching.
The idea of contour tracking is:
1. Based on the extracted image edges, find the pixels on the contour;
2. Use certain “tracking criteria” to find other pixels on the object based on the features of these pixels;
3. The effectiveness of tracking mainly depends on the selection of the starting point and the tracking criteria.
Tracking Criteria:
After finding the boundary point in the lowest left position according to the starting point selection criteria, use it as the starting point and define the upper left as the initial search direction. If the point in that direction is a black point (feature point), it is considered a boundary point. Otherwise, rotate the search direction 45 degrees clockwise and continue until the first black point (feature point) is found. Then, take that black point (feature point) as the new boundary point, rotate the current search direction 90 degrees counterclockwise, and continue searching for the next black point (feature point) using the same method until the initial boundary point is found. If the search direction is indicated by an arrow, the contour tracking algorithm can be represented in a diagram.

After obtaining the edge table of the object through the above algorithm, it can be used as contour expression or processed to express the contour in a processed information form. Currently, common contour expressions include the following three types: approximation fitting curve method, interpolation fitting curve method, and statistical feature method.
Target Image Color Detection
To express the rich and diverse colors, people use a three-dimensional space composed of three parameters representing color to describe the colors of images. This three-dimensional space is referred to as color space.
For example: RGB format (red, green, blue three-primary color model), HSV format (hue, saturation, brightness model).
The RGB format is a color standard in the industry, obtained by varying the three color channels of red (R), green (G), and blue (B) and their superposition to create various colors. RGB represents the colors of the three channels, and this standard encompasses almost all colors perceivable by human vision, making it one of the most widely used color systems today.

HSV format: The H parameter represents color information, indicating the position in the spectrum. This parameter is expressed as an angle, with red, green, and blue spaced 120 degrees apart, and complementary colors differing by 180 degrees. Purity S is a ratio ranging from 0 to 1, representing the ratio of the selected color’s purity to its maximum purity. When S=0, it is only grayscale. V indicates the brightness of the color, ranging from 0 to 1. It is important to note that it does not have a direct relationship with light intensity.

For example: Using a histogram to express the color distribution characteristics of an image: 1. Quantify the colors; 2. Calculate the number of pixels falling within each small interval; 3. Perform histogram projection to obtain the color probability distribution image of the image.


Image Filtering and Morphological Processing Algorithms
During the extraction of moving targets, it is necessary to filter the images collected by the camera to eliminate noise and improve image quality, making the image clearer. Common types of noise include Gaussian noise, additive noise, and salt-and-pepper noise.
To eliminate the influence of noise and highlight certain features of the image, common methods include neighborhood mean filtering and median filtering.

After the extraction of moving targets, the resulting binary image may contain many voids and isolated noise points. To better locate and track the target, morphological processing of the detection results is required.
The main idea of morphological processing is to use a specific structural element as a tool to measure and extract image features (shapes, contours, etc.), specifically by assessing whether the structural element can be appropriately and effectively inserted into the interior of the image. Common morphological operations include dilation, erosion, opening, and closing operations.
Image Matching
Image matching refers to the process of finding correspondences between multiple images, specifically searching for the image region most similar to a given target in a frame of image or finding the image most similar to the target among a batch of images.
Using image matching technology, it is possible not only to detect whether there is a target image in the search image but also to obtain the relative position information of the target in the image.
Matching algorithms can be broadly divided into two categories: region-based matching methods and feature-based matching methods.
Region-Based Matching Methods
These methods establish a similarity measure between the target template and the target image to be matched based on the grayscale information of the entire image, and then use a corresponding search algorithm to find the place with the highest similarity measure in the target image to be matched.

Feature-Based Matching Methods
These methods extract features from the image, such as feature points, edges, colors, textures, etc., process the extracted features to express them in a specific form (vector, histogram), and perform matching based on certain similarity measure criteria to establish correspondences between the target template and the image to be matched.

Application Fields of Image Tracking Technology
Currently, target tracking technology is mainly applied in the following fields:
1. Intelligent video surveillance: Based on motion recognition (human recognition based on gait, automatic object detection, etc.), automated monitoring (monitoring a scene to detect suspicious behavior); traffic monitoring (real-time collection of traffic data to direct traffic flow);
2. Human-computer interaction: Traditional human-computer interaction is conducted through computer keyboards and mice. To enable computers to recognize and understand human postures, actions, gestures, etc., tracking technology is key;
3. Robot visual navigation: In intelligent robots, tracking technology can be used to calculate the motion trajectory of captured objects;
4. Virtual reality: 3D interaction and virtual character motion simulation in virtual environments directly benefit from research results in video human motion analysis, providing participants with richer interaction forms, where human tracking analysis is a key technology;
5. Medical diagnosis: Tracking technology has wide applications in the automatic analysis of ultrasound and nuclear magnetic resonance sequence images. Due to the noise in ultrasound images often drowning out useful information in single-frame images, making static analysis very difficult, tracking technology can utilize the geometric continuity and temporal correlation of targets in sequence images to yield more accurate results.
Editor: Xia Zhi