Human Activity Recognition Based on LSTM-CNN

Source: DeepHub IMBA



This article is about 3400 words long and is recommended to read for more than 10 minutes.
This article will guide you to recognize human activities using raw data generated by mobile sensors.

Human Activity Recognition (HAR) is a method that uses Artificial Intelligence (AI) to recognize human activities from raw data generated by activity recording devices such as smartwatches. When people perform certain actions, the sensors worn by them (smartwatches, wristbands, dedicated devices, etc.) generate signals. These information-gathering sensors include accelerometers, gyroscopes, and magnetometers. Human activity recognition has a wide range of applications, from assisting patients and disabled individuals to fields like gaming that heavily rely on analyzing motor skills. We can roughly categorize these human activity recognition technologies into two types: fixed sensors and mobile sensors. In this article, we will use raw data generated by mobile sensors to recognize human activities.

In this article, I will use LSTM (Long Short-Term Memory) and CNN (Convolutional Neural Network) to recognize the following human activities:

Going Downstairs
Going Upstairs
Running
Sitting
Standing
Walking

Overview

You might wonder why we are using the LSTM-CNN model instead of basic machine learning methods?

Machine learning methods largely rely on heuristic manual feature extraction for human activity recognition tasks, while what we need here is end-to-end learning, simplifying the operation of heuristic manual feature extraction.

The model I will use is a deep neural network formed by the combination of LSTM and CNN, capable of extracting activity features and classifying using only model parameters.

Here we use the WISDM dataset, totaling 1,098,209 samples. Through our training, the model’s F1 score is 0.96, and on the test set, the F1 score is 0.89.

Import Libraries

First, we will import all the necessary libraries that we will need.

from pandas import read_csv, unique
import numpy as np
from scipy.interpolate import interp1d
from scipy.stats import mode
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay
from tensorflow import stack
from tensorflow.keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense, GlobalAveragePooling1D, BatchNormalization, MaxPool1D, Reshape, Activation
from keras.layers import Conv1D, LSTM
from keras.callbacks import ModelCheckpoint, EarlyStopping
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")

We will use Sklearn, Tensorflow, Keras, Scipy, and Numpy to build the model and perform data preprocessing. We will use PANDAS for data loading and matplotlib for data visualization.

Dataset Loading and Visualization

WISDM is recorded by accelerometers on mobile devices carried by individuals at their waist. The data collection is supervised by individuals to ensure data quality. The file we will use is WISDM_AR_V1.1_RAW.TXT. Using PANDAS, we can load the dataset into a DataFrame, as shown in the code below:

def read_data(filepath):  df = read_csv(filepath, header=None, names=['user-id',                                              'activity',                                              'timestamp',                                              'X',                                              'Y',                                              'Z'])  ## removing ';' from last column and converting it to float  df['Z'].replace(regex=True, inplace=True, to_replace=r';', value=r'')  df['Z'] = df['Z'].apply(convert_to_float)  return df
def convert_to_float(x):  try:      return np.float64(x)  except:      return np.nan
df = read_data('Dataset/WISDM_ar_v1.1/WISDM_ar_v1.1_raw.txt')df

plt.figure(figsize=(15, 5))
plt.xlabel('Activity Type')
plt.ylabel('Training examples')
df['activity'].value_counts().plot(kind='bar',                                title='Training examples by Activity Types')
plt.show()
plt.figure(figsize=(15, 5))
plt.xlabel('User')
plt.ylabel('Training examples')
df['user-id'].value_counts().plot(kind='bar',                                title='Training examples by user')
plt.show()

Now I will visualize the accelerometer data collected on the three axes.

def axis_plot(ax, x, y, title):  ax.plot(x, y, 'r')  ax.set_title(title)  ax.xaxis.set_visible(False)  ax.set_ylim([min(y) - np.std(y), max(y) + np.std(y)])  ax.set_xlim([min(x), max(x)])  ax.grid(True)
for activity in df['activity'].unique():  limit = df[df['activity'] == activity][:180]  fig, (ax0, ax1, ax2) = plt.subplots(nrows=3, sharex=True, figsize=(15, 10))  axis_plot(ax0, limit['timestamp'], limit['X'], 'x-axis')  axis_plot(ax1, limit['timestamp'], limit['Y'], 'y-axis')  axis_plot(ax2, limit['timestamp'], limit['Z'], 'z-axis')  plt.subplots_adjust(hspace=0.2)  fig.suptitle(activity)  plt.subplots_adjust(top=0.9)  plt.show()

Data Preprocessing

Data preprocessing is a very important task that enables our model to better utilize our raw data. The data preprocessing methods used here include:

Label Encoding
Linear Interpolation
Data Segmentation
Normalization
Time Series Segmentation
One-Hot Encoding

Label Encoding

Since the model cannot accept non-numeric labels as input, we will add an encoded label for the ‘activity’ column in another column, named ‘activityEncode’. The labels are converted into numeric labels as shown below (these labels are the results we want to predict)

Going Downstairs [0]
Jogging [1]
Sitting [2]
Standing [3]
Going Upstairs [4]
Walking [5]

label_encode = LabelEncoder()df['activityEncode'] = label_encode.fit_transform(df['activity'].values.ravel())df

Linear Interpolation

Using linear interpolation can avoid the problem of data loss due to NaN values during collection. It will fill in the missing values through interpolation. Although there is only one NaN value in this dataset, it still needs to be implemented for our demonstration.

interpolation_fn = interp1d(df['activityEncode'], df['Z'], kind='linear')
null_list = df[df['Z'].isnull()].index.tolist()
for i in null_list:  y = df['activityEncode'][i]  value = interpolation_fn(y)  df['Z']=df['Z'].fillna(value)  print(value)

Data Segmentation

Data segmentation is performed based on user ID to avoid segmentation errors. We use users with IDs less than or equal to 27 in the training set, and the rest in the test set.

df_test = df[df['user-id'] > 27]
df_train = df[df['user-id'] <= 27]

Normalization

Before training, it is necessary to normalize the data features to a range of 0 to 1. The method we use is:

df_train['X'] = (df_train['X']-df_train['X'].min())/(df_train['X'].max()-df_train['X'].min())
df_train['Y'] = (df_train['Y']-df_train['Y'].min())/(df_train['Y'].max()-df_train['Y'].min())
df_train['Z'] = (df_train['Z']-df_train['Z'].min())/(df_train['Z'].max()-df_train['Z'].min())
df_train

Time Series Segmentation

Since we are dealing with time series data, we need to create a segmentation function that segments the label name and the range of each record. This function performs feature separation in x_train and y_train, dividing every 80 time steps into a group of data.

def segments(df, time_steps, step, label_name):  N_FEATURES = 3  segments = []  labels = []  for i in range(0, len(df) - time_steps, step):      xs = df['X'].values[i:i+time_steps]      ys = df['Y'].values[i:i+time_steps]      zs = df['Z'].values[i:i+time_steps]
      label = mode(df[label_name][i:i+time_steps])[0][0]      segments.append([xs, ys, zs])      labels.append(label)
  reshaped_segments = np.asarray(segments, dtype=np.float32).reshape(-1, time_steps, N_FEATURES)  labels = np.asarray(labels)
  return reshaped_segments, labels
TIME_PERIOD = 80
STEP_DISTANCE = 40
LABEL = 'activityEncode'
x_train, y_train = segments(df_train, TIME_PERIOD, STEP_DISTANCE, LABEL)

Thus, the shapes of x_train and y_train become:

print('x_train shape:', x_train.shape)
print('Training samples:', x_train.shape[0])
print('y_train shape:', y_train.shape)
x_train shape: (20334, 80, 3)
Training samples: 20334
y_train shape: (20334,)

Here, some data that will be used later is also stored: the time period (time_period), the number of sensors (sensors), and the number of classes (num_classes).

time_period, sensors = x_train.shape[1], x_train.shape[2]
num_classes = label_encode.classes_.size
print(list(label_encode.classes_))
['Going Downstairs', 'Jogging', 'Sitting', 'Standing', 'Going Upstairs', 'Walking']

Finally, we need to reshape it to a list as the input for Keras:

input_shape = time_period * sensors
x_train = x_train.reshape(x_train.shape[0], input_shape)
print("Input Shape: ", input_shape)
print("Input Data Shape: ", x_train.shape)
Input Shape: 240
Input Data Shape: (20334, 240)

Finally, all data needs to be converted to float32.

x_train = x_train.astype('float32')
y_train = y_train.astype('float32')

One-Hot Encoding

This is the last step of data preprocessing, where we will perform encoding of the labels and store them in y_train_hot.

y_train_hot = to_categorical(y_train, num_classes)
print("y_train shape: ", y_train_hot.shape)
y_train shape: (20334, 6)

Model

The model we use is a sequential model consisting of 8 layers. The first two layers of the model are LSTM layers, each with 32 neurons, using the ReLU activation function. Then there are convolutional layers for extracting spatial features.

At the connection point of the two layers, we need to change the LSTM output dimensions, as the output has 3 dimensions (number of samples, time steps, input dimensions), while CNN requires 4-dimensional input (number of samples, 1, time steps, input).

The first CNN layer has 64 neurons, and the other layer has 128 neurons. Between the first and second CNN layers, we have a max pooling layer to perform down-sampling operations. Then there is a Global Average Pooling (GAP) layer that converts multi-dimensional feature maps into a 1-D feature vector, as no parameters are needed in this layer, which reduces the overall model parameters. Then there is a BN layer, which helps with the convergence of the model.

The last layer is the output layer of the model, which is simply a fully connected layer with 6 neurons with a SoftMax classifier layer, representing the probability of the current class.

model = Sequential()
model.add(LSTM(32, return_sequences=True, input_shape=(input_shape,1), activation='relu'))
model.add(LSTM(32,return_sequences=True, activation='relu'))
model.add(Reshape((1, 240, 32)))
model.add(Conv1D(filters=64,kernel_size=2, activation='relu', strides=2))
model.add(Reshape((120, 64)))
model.add(MaxPool1D(pool_size=4, padding='same'))
model.add(Conv1D(filters=192, kernel_size=2, activation='relu', strides=1))
model.add(Reshape((29, 192)))
model.add(GlobalAveragePooling1D())
model.add(BatchNormalization(epsilon=1e-06))
model.add(Dense(6))
model.add(Activation('softmax'))
print(model.summary())

Training and Results

After training, the model achieved an accuracy of 98.02% and a loss of 0.0058. The training F1 score is 0.96.

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(x_train,                  y_train_hot,                  batch_size= 192,                  epochs=100                  )

Visualize the accuracy and loss changes during training.

plt.figure(figsize=(6, 4))
plt.plot(history.history['accuracy'], 'r', label='Accuracy of training data')
plt.plot(history.history['loss'], 'r--', label='Loss of training data')
plt.title('Model Accuracy and Loss')
plt.ylabel('Accuracy and Loss')
plt.xlabel('Training Epoch')
plt.ylim(0)
plt.legend()
plt.show()
y_pred_train = model.predict(x_train)
max_y_pred_train = np.argmax(y_pred_train, axis=1)
print(classification_report(y_train, max_y_pred_train))

Test it on the test dataset, but before going through the test set, the same preprocessing needs to be done on the test set.

df_test['X'] = (df_test['X']-df_test['X'].min())/(df_test['X'].max()-df_test['X'].min())
df_test['Y'] = (df_test['Y']-df_test['Y'].min())/(df_test['Y'].max()-df_test['Y'].min())
df_test['Z'] = (df_test['Z']-df_test['Z'].min())/(df_test['Z'].max()-df_test['Z'].min())
x_test, y_test = segments(df_test,                        TIME_PERIOD,                        STEP_DISTANCE,                        LABEL)
x_test = x_test.reshape(x_test.shape[0], input_shape)
x_test = x_test.astype('float32')
y_test = y_test.astype('float32')
y_test = to_categorical(y_test, num_classes)

After evaluating our test dataset, we achieved an accuracy of 89.14% and a loss of 0.4647. The F1 test score is 0.89.

score = model.evaluate(x_test, y_test)
print("Accuracy:", score[1])
print("Loss:", score[0])

Next, plot the confusion matrix to better understand the predictions on the test dataset.

predictions = model.predict(x_test)
predictions = np.argmax(predictions, axis=1)
y_test_pred = np.argmax(y_test, axis=1)
cm = confusion_matrix(y_test_pred, predictions)
cm_disp = ConfusionMatrixDisplay(confusion_matrix= cm)
cm_disp.plot()
plt.show()

We can also evaluate the classification report of the model on the test dataset.

print(classification_report(y_test_pred, predictions))

Conclusion

The performance of the LSTM-CNN model is far superior to any other machine learning model. The code for this article can be found on GitHub.

https://github.com/Tanny1810/Human-Activity-Recognition-LSTM-CNN

You can try to implement it yourself and improve the F1 score by optimizing the model.

Additionally, this model is based on the paper “LSTM-CNN Architecture for Human Activity Recognition” published by Xia Kun, Huang Jianguang, and Hanyu Wang in IEEE journals.

https://ieeexplore.ieee.org/abstract/document/9043535

Author: Tanmay Chauhan

Editor: Huang Jiyan