Human Activity Recognition Algorithm Based on IMU Sensors and Deep Metric Learning

Table of Contents | 2024 Issue 3 Special Topic: 6G Integrated Sensing and Computing

Discussion on the Application of Integrated Communication Sensing Computing in the Internet of Vehicles

Immersive XR Practices and Prospects Based on 6G Integrated Sensing and Computing

Key Technologies for Integrated Sensing and Computing in Stereoscopic Traffic Systems

Exploration of 6G Integrated Sensing and Computing Intelligent Fusion Architecture and Scene Empowerment

Wireless Resource Management and Control Under Deep Integration of Communication, Sensing, Computing, and Storage

Integrated Communication, Sensing, and Computing for Edge Intelligent Networks: Architecture, Challenges, and Prospects

Cloud-Edge-Terminal Integrated Sensing and Computing Architecture Based on Distributed Arrays

Performance Simulation Evaluation of Integrated Communication and Sensing Systems Based on NR

RIS-Assisted Integrated Sensing: Joint Design of Beamforming and Reflection Phase Shift

Frequency Doppler Multiplexing Communication Sensing Integrated Waveforms

Research on Edge Intelligent Sensing Model Optimization Methods for Aerial Federated Learning

Dynamic Topology-Based UAV Network Computing Task Offloading Methods

3D Joint Localization for Integrated Communication and Sensing Systems

Non-uniform Sensing Signal Design for Integrated Sensing

Terahertz Sensing Cooperative Mobile Communication Methods and Performance Evaluation Paradigms

Design of Integrated Sensing Protocols for Network-Assisted Full-Duplex Non-Cellular Systems

Performance Analysis and Rate Region Characterization of Uplink Sensing and Computing Integration

“Mobile Communications” 2024 Issue 3

01【Research and Discussion】

(First Published Online: 2023-04-25)

Human Activity Recognition Algorithm Based on IMU Sensors and Deep Metric Learning

Shi Shang, He Zhengran, Dong Heng

(Nanjing University of Posts and Telecommunications, School of Communication and Information Engineering, Nanjing, Jiangsu 210023)

【Abstract】Human Activity Recognition (HAR) can be defined as determining a person’s various postures and daily activities through a series of observations and the surrounding environment. Many studies have attempted to apply deep learning techniques to HAR, however, existing DL-based HAR methods face challenges such as high complexity, significant computational requirements, and insufficient generalization and robustness. To address these issues, a HAR method called RMDML was proposed, which is based on smartphone-integrated IMU sensors and combines a lightweight neural network Res-MLP with deep metric learning for feature embedding, aiming to extract generalized features with separability and discriminability, thereby enhancing the model’s recognition and generalization performance. The RMDML model achieved an accuracy of 97.26% on the public UCI HAR dataset, surpassing several common HAR algorithms, demonstrating the effectiveness of the proposed method.

【Keywords】Human Activity Recognition; Inertia Measurement Unit Sensors; Residual Multi-Layer Perceptron; Metric Learning

doi:10.3969/j.issn.1006-1010.20230324-0001

Classification Number: TN929.5 Document Identifier Code: A

Article Number: 1006-1010(2024)03-0131-06

Citation Format: Shi Shang, He Zhengran, Dong Heng. Human Activity Recognition Algorithm Based on Inertia Measurement Unit Sensors and Deep Metric Learning [J]. Mobile Communications, 2024,48(3): 131-136.

SHI Shang, HE Zhengran, DONG Heng. Human Activity Recognition Algorithm Based on Inertia Measurement Unit Sensors and Deep Metric Learning [J]. Mobile Communications, 2024,48(3): 131-136.

0 Introduction

Human Activity Recognition (HAR) is a technology that determines a person’s various postures and daily activities through a series of observations and the surrounding environment. With the arrival of the 5G era^[1] and the rapid development of IoT technology, HAR has been widely applied in daily life and has become one of the core functions of intelligent products. HAR has great development potential and broad application scenarios, such as human-computer interaction^[2], sports activities^[3], smart homes^[4], healthcare^[5], and crime monitoring^[6]. As IoT technology advances, more scholars are focusing on HAR methods based on Inertia Measurement Unit (IMU) sensor data. IMU sensors include accelerometers, gyroscopes, and magnetometers, widely used in smartphones, automotive electronics, aerospace, and military fields. IMU-based HAR methods are cost-effective, low power consumption, and data collection is not constrained by the environment, but there is the inconvenience of needing to wear sensors. With the popularity of mobile network technology and smartphones, HAR algorithms based on the built-in IMU sensors of smartphones have started to gain attention^[7], as this method can fulfill user behavior recognition needs in daily life without the need to wear additional sensors. Meanwhile, with the continuous improvement of smartphone chip computing power, smartphones are increasingly capable of processing data, providing a good environment for the operation of related algorithms.

With the development of deep learning (DL) technology, it has become the latest method for recognition tasks and is widely applied in 6G communication technology^[8], automatic modulation classification^[9], channel state information^[10], radio frequency fingerprint recognition^[11], and signal recognition^[12]. The HAR field has also begun to use deep learning techniques to improve model performance and efficiency.Figure 1 illustrates a deep learning algorithm model based on the built-in IMU sensors of smartphones, using the accelerometer and gyroscope to collect data and preprocess it, then extracting high-dimensional data features through a neural network to achieve the purpose of behavior recognition. Literature[13] proposed a HAR algorithm using DNN as the neural network model. Literature[14] constructed a CNN-based model that uses data collected from integrated three-axis accelerometers and other IMU sensors in the user’s smartphone to recognize human activities. The three-dimensional raw accelerometer data can be used directly as input for training the CNN model without any complex preprocessing. Literature[15] used CNN and LSTM modules to build a general deep framework, DeepConvLSTM, which can be applied to homogeneous sensor modalities and can also fuse multimodal sensors to improve performance. Literature[16] first introduced semi-supervised CNN for IMU-based HAR technology, where CNN-based encoder-decoder and convolutional trapezoidal networks are used to learn better high-level features, employing L2 regularization to resist outliers in noisy sensor data, reducing more than 90% of labeled data while ensuring model performance. Although the aforementioned IMU-based HAR methods have achieved high recognition accuracy, they still face issues of high model complexity and poor generalization and robustness. Literature[17] proposed a HAR method based on IMU sensors and a lightweight neural network Res-MLP, effectively reducing model complexity, but the generalization and robustness of this method remain insufficient. To further enhance the recognition and generalization performance of the Res-MLP method, this paper combines Res-MLP with deep metric learning to propose the RMDML-HAR method. This method uses Res-MLP as the feature embedding network and employs a hybrid metric loss function focused on feature discriminability, which not only maintains the low complexity of the Res-MLP algorithm but also improves the model’s recognition and generalization performance.

1 Hybrid Metric Loss Function for Feature Discriminability

To improve the recognition and generalization performance of the HAR method based on IMU sensors and Res-MLP, this paper introduces the deep metric learning^[18] method, using a metric loss function for model parameter updates. Metric learning is an optimization method based on the concept of ‘learning to compare’ or ‘learning to measure’, aiming to compare or measure the similarity between different samples. Metric learning not only learns how to classify training samples but also learns how to measure the distance or similarity between different samples, thereby better generalizing to unseen test samples and improving the model’s generalization.

The key to classification methods based on metric learning lies in obtaining good feature embeddings. Feature embedding methods with generalization need to extract features that generally meet three characteristics: separability, good inter-class difference, and intra-class consistency, and features with these three characteristics can also be referred to as discriminative features. Discriminative features not only need to distinguish different categories of signal samples in feature space but also need to enhance inter-class differences (i.e., increase the distance between signal samples from different behaviors in feature space) while also improving intra-class consistency (i.e., reducing the distance between different signal samples from the same behavior in feature space).

This paper proposes a discriminative feature embedding method based on hybrid metrics, which uses Res-MLP without the last fully connected layer as the backbone network for feature embedding, that is, . Meanwhile, a hybrid loss function composed of Softmax loss and two typical metric loss functions replaces the single Softmax loss function, effectively narrowing the distance between different samples of the same category in feature space while expanding the distance between samples of different categories. The three types of loss functions will be introduced sequentially.

2 RMDML-HAR Algorithm

2.1 Algorithm Framework

2.2 Algorithm Optimization Methods

3 Experiments and Results Analysis

3.1 Model Training

This paper uses the public UCI HAR dataset^[20] for experiments. This dataset was collected from 30 volunteers aged 19-48 who wore smartphones at their waists, relying on the built-in IMU sensors (accelerometer and gyroscope) in the smartphones to collect their physical activity data. The UCI HAR dataset includes six different activities: walking, going upstairs, going downstairs, sitting, standing, and lying down. Samples from 21 volunteers are used for training, and 20% of the samples in the training set are taken as a validation set, while samples from 9 volunteers are used to test the model’s performance. The experiments use the TensorFlow framework for training and testing the RMDML model. During the training process, by repeatedly adjusting parameters, the optimal training parameters of the RMDML model are obtained as shown in Table 1. During training, each fully connected layer uses L2 regularization with a coefficient of 0.001 to avoid model overfitting. Additionally, the learning rate will be automatically adjusted with the training epochs, with an initial value of 0.01 and an adjustment factor of 0.1, allowing the learning rate to change as {0.01, 0.001, 0.0001,…}, enabling the model to converge quickly at the start of training and slow down the gradient change in the later stages to achieve a better-performing model.

3.2 Performance Analysis

(1) Baseline Methods

Currently, HAR methods based on IMU sensors and deep learning techniques mainly use networks such as CNN and LSTM. This paper selects four deep learning baseline methods for comparison, including three mainstream methods and the Res-MLP method that does not use metric loss functions. Below is a brief description of the comparison methods.

C4M4BL^[21]: Uses CNN to build a neural network that can effectively extract spatial features of behaviors.

BLSTM^[22]: Compared to LSTM, BLSTM can perform backpropagation, resulting in better performance and effectively extracting temporal features of behaviors.

CNN + LSTM^[23]: Combines CNN and LSTM to extract both spatial and temporal features simultaneously.

Res-MLP: A lightweight HAR method that uses a cross-entropy loss function, effectively balancing model complexity and accuracy.

(2) Feature Embedding Performance Analysis and Feature Visualization

Due to the diversity of performance evaluations and the average recognition accuracy not directly measuring the performance of different feature embeddings, this paper introduces the silhouette coefficient, commonly used in unsupervised learning as a quantitative evaluation index, primarily used to directly measure the intra-class consistency and inter-class difference of the features output by feature embedding methods. The larger the silhouette coefficient value, the higher the discriminability of the feature embedding method. The formula for calculating the silhouette coefficient is as follows:

Figure 3 shows the feature visualization of RMDML and four baseline methods on the test set, where 0, 1, 2, 3, 4, and 5 represent walking, going upstairs, going downstairs, sitting, standing, and lying down, respectively. It can be seen that the RMDML method distinctly separates each category, with only the features of going upstairs and going downstairs overlapping, due to their similar three-axis acceleration and three-axis angular velocity. Moreover, compared to the four baseline methods, the silhouette of the features of the RMDML method is more pronounced, indicating that the features extracted by the RMDML method have higher discriminability.

(3) Recognition Performance Comparison of Different Methods

To verify the superiority of the recognition and generalization performance of the RMDML algorithm, comparative experiments were conducted with four baseline methods. The recognition performance results of different methods are shown in Table 2, indicating that the RMDML method using the hybrid metric loss function outperforms other algorithms in accuracy, precision, recall, F-score, and silhouette coefficient. The performance of the proposed RMDML method exceeds that of Res-MLP, attributed to the hybrid metric function’s ability to extract more discriminative features, increasing the distance between features of different behaviors and reducing the distance between features of the same behavior, thus enabling the model to achieve higher recognition and generalization performance.

(4) Recognition Performance Comparison of Different Loss Functions

To verify the effectiveness of the proposed hybrid metric function, comparisons were made with the Softmax loss function, center loss function, and triplet loss function against the hybrid metric function. Table 3 shows the impact of different loss functions on the classification accuracy of the RMDML model, indicating that the performance of using the hybrid metric function is superior to that of using the Softmax loss function, center loss function, and triplet loss function alone.

4 Conclusion

This paper introduces deep metric learning ideas to improve the recognition accuracy and generalization performance of existing HAR algorithms based on Res-MLP, proposing the RMDML-HAR algorithm. The method first uses Res-MLP as the feature embedding network, then updates model parameters using the hybrid metric function proposed in this paper focused on discriminative features, and finally uses a fully connected layer as the classifier. Experiments on the public UCI HAR dataset demonstrate that the proposed algorithm significantly improves recognition accuracy, precision, recall, F-score, and silhouette coefficient compared to several common HAR algorithms, indicating that the RMDML algorithm has high recognition performance and high generalization capability.

References: (Scroll to browse)

[1] Kong Lan, Zhou Ting, Zhu Lei. 5G Empowers New Type of Smart Cities [J]. China Telecom Industry, 2022, 253(01): 28-31.

[2] Panwar M, Mehra P S. Hand Gesture Recognition for Human Computer Interaction [C]. 2011 International Conference on Image Information Processing. IEEE, 2011: 1-7.

[3] Ahmadi A, Mitchell E, Richter C, et al. Toward Automatic Activity Classification and Movement Assessment During a Sports Training Session [J]. IEEE Internet of Things Journal, 2014, 2(1): 23-32.

[4] Bianchi V, Ciampolini P, De Munari I. RSSI-Based Indoor Localization and Identification for ZigBee Wireless Sensor Networks in Smart Homes [J]. IEEE Transactions on Instrumentation and Measurement, 2018, 68(2): 566-575.

[5] Bisio I, Delfino A, Lavagetto F, et al. Enabling IoT for In-Home Rehabilitation: Accelerometer Signals Classification Methods for Activity and Movement Recognition [J]. IEEE Internet of Things Journal, 2016, 4(1): 135-146.

[6] Ehatisham-ul-Haq M, Azam M A, Loo J, et al. Authentication of Smartphone Users Based on Activity Recognition and Mobile Sensing [J]. Sensors, 2017, 17(9): 2043.

[7] Dong J, Cai Z. User Authentication Using Motion Sensor Data from Both Wearables and Smartphones [C]. Biometric Recognition: 11th Chinese Conference, CCBR 2016, Chengdu, China, October 14-16, 2016, Proceedings 11. Springer International Publishing, 2016: 756-764.

[8] Gui G, Liu M, Tang F, et al. 6G: Opening New Horizons for Integration of Comfort, Security, and Intelligence [J]. IEEE Wireless Communications, 2020, 27(5): 126-132.

[9] Wang Y, Gui G, Ohtsuki T, et al. Multi-task Learning for Generalized Automatic Modulation Classification Under Non-Gaussian Noise with Varying SNR Conditions [J]. IEEE Transactions on Wireless Communications, 2021, 20(6): 3587-3596.

[10] Wang J, Gui G, Ohtsuki T, et al. Compressive Sampled CSI Feedback Method Based on Deep Learning for FDD Massive MIMO Systems [J]. IEEE Transactions on Communications, 2021, 69(9): 5873-5885.

[11] Peng Y, Liu P, Wang Y, et al. Radio Frequency Fingerprint Identification Based on Slice Integration Cooperation and Heat Constellation Trace Figure [J]. IEEE Wireless Communications Letters, 2021, 11(3): 543-547.

[12] Lin Y, Tu Y, Dou Z, et al. Contour Stella Image and Deep Learning for Signal Recognition in the Physical Layer [J]. IEEE Transactions on Cognitive Communications and Networking, 2020, 7(1): 34-46.

[13] Bashar S K, Al Fahim A, Chon K H. Smartphone Based Human Activity Recognition with Feature Selection and Dense Neural Network [C]. 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 2020: 5888-5891.

[14] Xu W, Pang Y, Yang Y, et al. Human Activity Recognition Based on Convolutional Neural Network [C]. 2018 24th International Conference on Pattern Recognition (ICPR). IEEE, 2018: 165-170.

[15] Ordóñez F J, Roggen D. Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition [J]. Sensors, 2016, 16(1): 115.

[16] Yao L, Nie F, Sheng Q Z, et al. Learning from Less for Better: Semi-supervised Activity Recognition via Shared Structure Discovery [C]. Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing. 2016: 13-24.

[17] Shi S, Wang Y, Dong H, et al. Smartphone-Aided Human Activity Recognition Method Using Residual Multi-Layer Perceptron [C]. IEEE INFOCOM 2022-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, 2022: 1-6.

[18] Yang L, Jin R, Sukthankar R, et al. An Efficient Algorithm for Local Distance Metric Learning [C]. AAAI. 2006, 2: 543-548.

[19] Schroff F, Kalenichenko D, Philbin J. Facenet: A Unified Embedding for Face Recognition and Clustering [C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 815-823.

[20] Anguita D, Ghio A, Oneto L, et al. A Public Domain Dataset for Human Activity Recognition Using Smartphones [C]. Esann. 2013, 3: 3.

[21] Shan C Y, Han P Y, Yin O S. Deep Analysis for Smartphone-Based Human Activity Recognition [C]. 2020 8th International Conference on Information and Communication Technology (ICoICT). IEEE, 2020: 1-5.

[22] Yu S, Qin L. Human Activity Recognition with Smartphone Inertial Sensors Using Bidirectional LSTM Networks [C]. 2018 3rd International Conference on Mechanical, Control and Computer Engineering (ICMCCE). IEEE, 2018: 219-224.

[23] Mekruksavanich S, Jitpattanakul A. Smartwatch-Based Human Activity Recognition Using Hybrid LSTM Network [C]. 2020 IEEE SENSORS. IEEE, 2020: 1-4. ★

Scan the QR code to read and download this paper on Zhihu

★ The original text was published in “Mobile Communications”2024 Issue 3 ★

doi:10.3969/j.issn.1006-1010.20230324-0001

Classification Number: TN929.5 Document Identifier Code: A

Article Number: 1006-1010(2024)03-0131-06

SHI Shang, HE Zhengran, DONG Heng. Human Activity Recognition Algorithm Based on Inertia Measurement Unit Sensors and Deep Metric Learning [J]. Mobile Communications, 2024,48(3): 131-136.

Author InformationShi Shang(orcid.org/0009-0001-8285-2471): A master’s student in Electronic Information at the School of Communication and Information Engineering, Nanjing University of Posts and Telecommunications. His main research directions include human activity recognition and object detection.He Zhengran:A master’s student in Signal and Information Processing at Nanjing University of Posts and Telecommunications. His main research direction is passive sensing based on CSI in WiFi environments. During his graduate studies, he published one paper as the first author at the IEEE VTC conference and one at the IEEE Globecom conference, and has one paper pre-accepted in the IEEE IoT journal, winning second place in the 2020 3S competition.Dong Heng:An associate professor at Nanjing University of Posts and Telecommunications, his main research direction is broadband wireless communication theory and technology.

The submission method for “Mobile Communications” is online submission

Please log in to the web submission system

Link Address:http://ydtx.cbpt.cnki.net

Highlights【2024 Special Call for Papers–Issue 7】6G Endogenous Intelligent Technology【2024 Special Call for Papers–Issue 8】Integration of 6G and AI【2024 Special Call for Papers–Issue 9】Integrated Networks in Air, Land, and SeaTable of Contents | 2024 Issue 4 Special Topic: RIS-Assisted Integrated Sensing and Computing Table of Contents | 2024 Issue 3 Special Topic:6G Integrated Sensing and ComputingTable of Contents | 2024 Issue 2 Special Topic:Semiotic CommunicationTable of Contents ▏2024 Issue 1 Special Topic: Technologies for 6G Space-Ground Integrated Networks# Scan to follow us #“Mobile Communications” Interpreting Communications with Papers

The journal “Mobile Communications” is supervised by China Electronics Technology Group Corporation and sponsored by the Seventh Research Institute of China Electronics Technology Group Corporation. It is a “dual-effect journal” in the Chinese journal matrix, a high-quality electronic journal of the Ministry of Industry and Information Technology, a source journal for Chinese scientific and technological papers statistics, and a journal selected for the “High-Quality Scientific and Technological Journals Directory in the Field of Information and Communication” of the China Communication Society, as well as a journal selected for the “High-Quality Scientific and Technological Journals Directory in the Field of Electronic Technology and Communication Technology” of the China Electronics Society, and is included in the JST of Japan. The domestic continuous publication number is CN44-1301/TN, the international continuous publication number is ISSN1006-1010, and the postal distribution code is 46-181.

Leave a Comment Cancel reply