Monitoring Method for Ship Imitation Behavior Based on XGBoost

Previous Review

Recommended Article: Complexity of Xi’an Approach Airspace Based on Aircraft and Route Network

Recommended Article: Lightweight Massive Spatiotemporal Data Processing and Analysis Service Framework

This article was published in “Command Information Systems and Technology”, 2022, Issue 5

Authors:Sui Yuan,Duan Ran,Bai Zheng

Citation Format:Sui Yuan, Duan Ran, Bai Zheng. Monitoring Method for Ship Imitation Behavior Based on XGBoost [J]. Command Information Systems and Technology, 2022, 13(5): 60-65.

Abstract

To address the issue of effectively identifying ship types in maritime safety supervision, a monitoring method for ship imitation behavior based on XGBoost is proposed. First, the trajectory features of different types of ships are calculated using historical trajectory data. Then, the XGBoost algorithm is used to train a ship classification model based on the ship trajectory features. Finally, an automatic detection process for ship types is proposed to achieve monitoring of ship imitation behavior. Experimental results show that this method has high classification accuracy, fast training convergence speed, and high classification efficiency for ship type classification.

Introduction

With the increasing number of ships navigating in coastal and inland waters in China, the regulatory pressure on water-related departments is growing. To effectively implement dynamic supervision of ships, regulatory departments need to obtain information such as ship names, types, locations, and speeds. Information such as ship names and types primarily relies on the shipborne Automatic Identification System (AIS) for acquisition, but some ships (especially fishing vessels) report inaccurate information. Some fishing vessels illegally operate in closed fishing zones or during fishing bans for economic purposes, while deliberately reporting incorrect ship types to evade penalties, negatively impacting dynamic ship control. Furthermore, imitation behavior of fishing vessel types poses significant safety risks. From 2016 to 2018, there were 223 collision accidents of general grade and above nationwide, of which 80 were collisions between merchant ships and fishing vessels, accounting for 35.9%. Therefore, effectively identifying imitation fishing vessels is crucial for preventing collisions between merchant ships and fishing vessels.

With the efforts of organizations such as the International Maritime Organization (IMO) and the International Association of Lighthouse Authorities (IALA), AIS standards have been formally proposed, and all ships are required to install AIS. AIS data is periodically broadcasted by ships during navigation, characterized by its massive scale, spatiotemporal nature, and frequent updates. Under normal circumstances, the reporting period for AIS data from Class A shipborne equipment is less than 10 seconds, and for Class B shipborne equipment with a speed greater than 2 knots, it is less than 30 seconds, making AIS data abundant. In actual ship supervision, when the reported type of a ship cannot be trusted, supervisors can only rely on experience to estimate the ship type, which is both inefficient and inaccurate. Therefore, this working method urgently needs improvement. AIS messages contain the ship’s unique code, Maritime Mobile Service Identity (MMSI), ship type, as well as latitude, longitude, heading, speed, and timestamp information. Different ships’ AIS data have different characteristics, providing a data basis for learning the navigation characteristics of different types of ships. This article proposes a monitoring method for ship imitation behavior based on XGBoost, utilizing historical trajectory data to generate classification features, training to generate a classification model for ship types, and predicting the types of ships in real-time navigation to achieve automatic monitoring of ship imitation behavior.

Ship Type Classification Algorithm

Using historical trajectory data to determine ship types is a classification problem in supervised learning. Typical algorithms include deep learning and boosting algorithms. Deep learning is adept at handling complex data that is difficult to extract features from (such as images and speech), while for established features, deep learning performs complex transformations of input data in high-dimensional space to obtain classification models. If the transformation process fails, it can cause the data to become more intertwined. Boosting is an ensemble learning (EL) method, where the basic idea is to combine weak learners with low prediction accuracy into a strong learner with high accuracy. Weak learners are used to train samples, assigning greater weight to misclassified samples and retraining until a satisfactory classification model is obtained. Common algorithms include Adaboost, Gradient Boosting Decision Tree (GBDT), and XGBoost. Since boosting is good at handling established feature problems and can improve the efficiency of the learning system, and because historical trajectory data contains determined feature data such as speed, position, and heading, using boosting algorithms can achieve higher learning efficiency and success rates.

This article aims to study a method that can meet the real-time type judgment of massive ships, which requires the relevant algorithms to have high judgment accuracy and efficiency. The Adaboost algorithm increases the weight of misclassified samples in each iteration to achieve boosting, but it is overly sensitive to noisy samples. Since there is some erroneous information in the reported data of ships, the Adaboost algorithm cannot perform well in this regard. The GBDT algorithm has high prediction accuracy and can handle outliers well, but since GBDT does not perform parallel processing in each iteration during training, it is unacceptable in terms of time consumption when training on massive datasets. The XGBoost algorithm can be seen as an industrial improvement version of GBDT, which is more tolerant of noisy data and uses an approximate greedy algorithm to sort data to determine split points during tree structure partitioning, achieving parallel computation of feature granularity, which significantly improves computation speed compared to GBDT. The XGBoost algorithm adds a regularization term on the basis of GBDT to avoid overfitting. The objective function fitting of the XGBoost algorithm uses a second-order Taylor expansion, which is more accurate than the first-order expansion of GBDT and faster in computation. The XGBoost algorithm draws on the random forest method, supporting sample and column sampling during training, enhancing the model’s generalization ability. Therefore, this article uses the XGBoost algorithm as the foundational algorithm for ship type judgment.

Feature Generation

2.1 Feature Selection

Monitoring Method for Ship Imitation Behavior Based on XGBoost

Figure 1 Comparison of Ship Navigation Curves Under Different Navigation Ratio Conditions

Monitoring Method for Ship Imitation Behavior Based on XGBoost

Figure 2 Probability Distribution of Three Types of Ship Speeds and Schematic of Second Central Moment Origin Moment

Monitoring Method for Ship Imitation Behavior Based on XGBoost

Using 0.5 m/s² as the discretization unit to statistically analyze the acceleration probability distribution of passenger ships, cargo ships, and fishing vessels over a random segment of trajectory, and calculate the kurtosis and skewness of the three types of ship trajectories. The probability distribution of speed differences and the kurtosis and skewness schematic for the three types of ships are shown in Figure 3, indicating that the continuous speed difference kurtosis and skewness of different types of ships are significantly different.

Monitoring Method for Ship Imitation Behavior Based on XGBoost

Figure 3 Probability Distribution of Speed Differences and Schematic of Kurtosis and Skewness for Three Types of Ships

2.2 Feature Calculation

Monitoring Method for Ship Imitation Behavior Based on XGBoost

Figure 4 Schematic of Sliding Window Feature Calculation

Monitoring Method for Ship Imitation Behavior Based on XGBoost

Monitoring Method for Ship Imitation Behavior

The monitoring method for ship imitation behavior based on XGBoost (the method in this article) includes the following three steps: 1) Processing the dynamic point data of continuous trajectories in historical ship data to generate a feature dataset; 2) Constructing a classifier using boosting algorithms to train and generate navigation feature models for different types of ships; 3) Judging the types of actual ship targets in real-time calculation of classification features through the ship feature model. The overall process of this method is shown in Figure 5.

Monitoring Method for Ship Imitation Behavior Based on XGBoost

Figure 5 Overall Process of the Method in This Article

3.1 Model Generation

To improve the classification accuracy of the XGBoost algorithm model, this article uses a binary classification approach for model training. Each model training uses fishing vessels and another type of vessel to be judged, where fishing vessels are negative samples, and the other vessel type is the positive sample. To effectively detect fishing vessels that imitate other vessels, this article perturbs the evaluation function of ensemble learning, increasing the corresponding type’s weight, thus accelerating the training iteration process. In practice, during the calculation of the evaluation function, weights can be increased for misclassified fishing vessel samples, thereby increasing the evaluation function’s value when misjudging fishing vessel types, making model training more inclined to reduce the misjudgment rate of fishing vessels. The misjudgment rate formula for fishing vessels is as follows:

Monitoring Method for Ship Imitation Behavior Based on XGBoost

The steps for generating the XGBoost algorithm model are as follows:

1) Divide all data by ship type, removing outliers from the message data of each type of ship, and sort the data by ship ID and time in ascending order, storing the sorted ship type data as files.

2) Read the data of one type of ship and segment the trajectory. A segmentation occurs if the following conditions are met by the two message data points before and after: different ship IDs; a time difference exceeding 300 seconds; or a distance exceeding 500 meters.

3) Filter the continuous trajectories of the segmented type of ship based on the number of trajectory points, removing those with fewer than 60 points.

4) Take a segment of the trajectory, set a sliding window of size 60 and step size 5 for feature calculation, and after completing feature calculation for all continuous trajectory data, store the feature values in a sample file.

5) Repeat steps 2) to 4) for all types of ship message data to complete feature value calculation, obtaining feature samples for all types of ships.

6) Read the sample files for fishing vessels and another type of vessel, using fishing vessels as negative samples and other vessels as positive samples, generating corresponding binary classification labels, mixing the sample files after balancing the samples.

7) Normalize the samples and save the normalization model, selecting 90% of the normalized samples for model training and 10% for model evaluation.

8) Build the XGBoost classification trainer, setting the classifier’s hyperparameters and training termination conditions, inputting the training samples into the classifier for model training, using equation (9) as the evaluation function to evaluate the evaluation samples.

9) Repeat step 8) to adjust the classifier’s hyperparameters to minimize the evaluation function value, saving the model obtained at this time.

10) Repeat steps 8) to 9) to process all types of ship data except for fishing vessels, completing binary classification model training for all ships and fishing vessels, obtaining the classification model and normalization model.

3.2 Imitation Monitoring

When monitoring ship imitation behavior using the method in this article, it is necessary to first receive and record real-time ship trajectory messages and calculate the feature samples to be judged for real-time ship trajectory messages, then input these into the classification model for judgment. For example, if a ship reported as a non-fishing vessel is predicted as a fishing vessel by the classification model, this ship is determined to possibly be imitating a fishing vessel. The steps for generating real-time feature samples using this method are as follows:

1) For ships reported as non-fishing vessels by AIS, record their trajectory point messages;

2) Read the historical trajectory point messages of the ship, and if the length exceeds 60 points, the time difference between continuous trajectory points does not exceed 300 seconds, and the distance difference does not exceed 500 meters, calculate once for features;

3) For ships meeting the conditions in step 2), calculate the navigation ratio, trajectory centroid, and time median of the last 60 historical trajectory points using the formulas from section 2.1, as well as the series of navigation speeds, headings, continuous speed differences, and continuous heading differences of the last 60 historical trajectory points, and calculate the central moments and original moments, as well as kurtosis and skewness after discretization statistics;

4) For the features calculated in step 3), use the normalization model obtained from step 7) of section 3.1 for normalization, obtaining the final features for type judgment.

Monitoring Method for Ship Imitation Behavior Based on XGBoost

Figure 6 Imitation Behavior Monitoring Process

Experimental Results and Analysis

4.1 Experimental Design

To select types such as fishing vessels, passenger ships, cargo ships, and oil tankers from historical data, this article conducted simulation experiments. The operating system environment for the simulation experiments was CentOS 7.4, the development language was Python 3.6, and the XGBoost algorithm package version was Python API 0.81. The central processing unit (CPU) of the training platform was Intel Xeon E5-1620 3.5 GHz, with 32 GB of memory. The experimental data consisted of continuous AIS data from a maritime area over two months, totaling about 90 million records, with 90% of the data used for model training and 10% for model validation. This article conducted the following two experiments:

1) Experiment 1: Using all AIS data, classifiers were constructed using the XGBoost, GBDT, and Random Forest algorithms, generating training and evaluation feature samples using the method in this article, and training to generate classification models. Each model training used ten-fold cross-validation. This experiment compared the training time, model size, and model accuracy of the three algorithms, as well as the detection judgment speed of random samples by each algorithm’s model.

2) Experiment 2: Using all AIS data, the sliding window method was used to directly concatenate the latitude, speed, heading, and time data items of 60 continuous points into 300-dimensional features, comparing them with the 24-dimensional features generated by the method in this article. Each model training used the XGBoost algorithm to construct classifiers for hyperparameter adjustment, and ten-fold cross-validation was used. The experiment compared the model training speed, model size, classification accuracy, and classification judgment time for the validation set samples between the direct concatenation method (300 dimensions) and the method in this article (24 dimensions).

4.2 Result Analysis

In Experiment 1, models were trained using the three algorithms, and the experimental result data comparisons are shown in Table 1, where the binary classification models include fishing vessel vs passenger vessel, fishing vessel vs cargo vessel, and fishing vessel vs oil tanker. From Table 1, it can be seen that the XGBoost algorithm has a model accuracy close to that of the GBDT algorithm, but its training speed is faster than that of the GBDT algorithm; the XGBoost algorithm has a training speed similar to that of the Random Forest algorithm, but with slightly higher accuracy and a smaller model size. The models trained using the three algorithms were used to predict sample types, and the time taken for sample classification judgment was analyzed. The detection time for different sample sizes is shown in Table 2. From Table 2, it can be seen that the XGBoost algorithm outperforms the GBDT and Random Forest algorithms in terms of efficiency for both small and large-scale data detection and judgment, making it an ideal algorithm for real-time monitoring of ship type imitation.

Table 1 Comparison of Experimental Result Data for Three Algorithms

Monitoring Method for Ship Imitation Behavior Based on XGBoost

Table 2 Detection Time Under Different Sample Sizes

Monitoring Method for Ship Imitation Behavior Based on XGBoost

In Experiment 2, the direct concatenation method (300 dimensions) and the method in this article (24 dimensions) were used to train and validate samples, using the XGBoost algorithm to construct classifiers, with 100 sample sizes. The experimental result comparisons between the two feature generation methods are shown in Table 3. It can be seen that compared to the direct concatenation method, the method in this article has higher classification accuracy, faster training convergence speed, and better classification judgment efficiency, achieving effective supervision of ship imitation behavior.

Table 3 Comparison of Experimental Results for Two Feature Generation Methods

Monitoring Method for Ship Imitation Behavior Based on XGBoost

Conclusion

To address the illegal behavior of some ships imitating other ships by tampering with AIS device information, a monitoring method for ship imitation behavior based on XGBoost has been designed and experimentally validated. After integrating real-time ship data, this method can detect inconsistencies between the navigation features of imitating ships and those of the ships being imitated, generating early warnings for ship imitation behavior. Experimental results show that the features extracted by this method do not clearly distinguish between passenger ships, cargo ships, and cruise ships. Future work will analyze the features of more types of ships to improve the accuracy of this method for classifying other types of ships.

Recommended Related Literature

Sui Yuan, Duan Ran, Zhu Deli. A Ship Name Matching Method Based on Deep Twin Networks [J]. Command Information Systems and Technology, 2022, 13(3): 32-35.
Zhang Guilin, Wu Wei, Xu Jian, et al. Rule Engine-Based Maritime Formation Target Recognition Inference [J]. Command Information Systems and Technology, 2022, 13(3): 28-31.
Yang Wangqi, Liu Jun, Chen Zhen, et al. A Ship Identification Number Recognition Method Based on Deep Learning [J]. Command Information Systems and Technology, 2022, 13(1): 58-63.
Qi Zhigang, Zhao Yulin, Xu Yingtao. Requirements and Applications of Aircraft Carrier Formation Command Information System in Maritime Battlefield Environment [J]. Command Information Systems and Technology, 2021, 12(2): 32-37.
Guo Tan, Xu Jianan, Liu Rui. Longitudinal Passability Calculation and Kinematic Simulation of Large Container Vehicles in Air Transport [J]. Command Information Systems and Technology, 2021, 12(4): 97-102.
Zhou Ye, Yu Jian, Cui Huachao, et al. Numerical Simulation of Underwater Target Echo [J]. Command Information Systems and Technology, 2020, 11(6): 87-90.
Zhang Ye, Fan Wuyang, Hua Qinglong, et al. Simulation and Motion State Recognition of SAR Ships Based on 3D Models [J]. Command Information Systems and Technology, 2020, 11(4): 89-95.
Zhu Hui, Zhou Yang, Shao Weiwei. Vibration Strain Monitoring of Φ-OTDR Distributed Optical Fiber Sensors [J]. Command Information Systems and Technology, 2020, 11(2): 74-79.
Han Xiaoning, Zhang Youyu, Wang Jun. Development and Application of Intelligent Ship Traffic Management Systems [J]. Command Information Systems and Technology, 2019, 10(4): 8-13.
Zhao Weichun, Pan Leyi, Hou Huifeng. Monitoring and Fault Localization Technology Architecture for Air Traffic Control Information Network Transmission Performance [J]. Command Information Systems and Technology, 2019, 10(1): 13-18.

Reply with the following keywords to view a series of articles:

Hot Topics:ABMS｜Unmanned Autonomous Systems｜Joint All-Domain Command｜Urban Warfare

Strategic Planning:Development Planning｜Regulations｜Think Tank Reports

Operational Concepts:Mosaic Warfare｜Multi-Domain Operations｜Distributed Lethality

Cutting-Edge Technologies:Artificial Intelligence｜Cloud Computing｜Big Data｜Internet of Things｜Blockchain｜5G

System Equipment:Army｜Navy｜Air Force｜Space Force｜Cyber Space｜NC3｜Air Defense and Missile Defense｜Logistics Support

Air Traffic:NextGen｜SESAR｜Drones

Monitoring Method for Ship Imitation Behavior Based on XGBoost

For more information, long press the QR code to follow

Leave a Comment Cancel reply