Squat Detector Based on OpenCV and TensorFlow

Click on the above“Beginner Learning Vision” to select Star or “Pin”

Important content delivered immediately

This issue we will introduce how to use OpenCV and TensorFlow to implement squat detection.

During the quarantine period, our physical activities are very limited, which is not good. When doing some home exercises, we must always maintain a high level of concentration to track our daily exercise volume. Therefore, we hope to establish an automated system to calculate the exercise volume. Considering that we have clear stages and significant changes in the basic movement when squatting, counting squats will be relatively simple.

Let’s try to implement it together!

Data Collection

Using a Raspberry Pi with a camera to capture images is very convenient. After capturing the images, we can use OpenCV to write the captured images to the file system.

Motion Recognition

Initially, we planned to use image segmentation to extract the person. However, we all know that image segmentation is a very cumbersome operation, especially with limited resources on the Raspberry Pi.

Moreover, image segmentation ignores a fact. What we currently have is a series of image frames, not a single image. This sequence of images has clear functionality, and we will use it later.

Therefore, we started with OpenCV to perform background removal to provide reliable results.

Background Subtraction

First, create a background subtractor:

backSub = cv.createBackgroundSubtractorMOG2()

Add image frames to it:

mask = backSub.apply(frame)

Finally, we can get an image with the body contour:

Then dilate the image to highlight the contours.

mask = cv.dilate(mask, None, 3)

Applying this algorithm to all image frames can yield the posture in each image. Afterwards, we will classify them into three situations: standing, squatting, and none.

Next, we need to extract the person from the image, and OpenCV can help us find the corresponding contours:

cnts, _ = cv.findContours(img, cv.RETR_CCOMP, cv.CHAIN_APPROX_SIMPLE)

This method is more or less suitable for extracting the largest contour of the person, but unfortunately, the result is not stable. For example, the detected largest contour may only include the person’s body but not their feet.

Nevertheless, having a series of images is very helpful. Typically, we perform squats in the same location, so we can assume that all movements occur within a certain area and that this area is stable. To achieve this, we can iteratively construct the bounding rectangle and, if necessary, increase the bounding rectangle with the largest contour.

Here is an example:

• The largest contour is red

• The contour bounding rectangle is blue

• The image bounding rectangle is green

Through the above edge extraction and contour drawing, we can adequately prepare for further processing.

Classification

Next, we will extract the bounding rectangles from the images and convert them into 64×64 square sizes.

The following Mask serves as input for the classifier:

Standing posture:

Squatting posture:

Next, we will use Keras with TensorFlow for classification.

Initially, we used the classic Lenet-5 model, which performed well. After reading some articles about variants of Lenet-5, we decided to try simplifying the architecture.

It turns out that the simplified CNN has almost the same accuracy in the current example:

model = Sequential([       Convolution2D(8,(5,5), activation='relu', input_shape=input_shape),       MaxPooling2D(),       Flatten(),       Dense(512, activation='relu'),       Dense(3, activation='softmax')     ])model.compile(loss="categorical_crossentropy", optimizer=SGD(lr=0.01), metrics=["accuracy"])

The accuracy after 10 epochs was 86%, after 20 epochs it was 94%, and after 30 epochs it was 96%. Training further may lead to overfitting, causing a decline in accuracy, so next we will apply this model in real life.

Model Application

We will run it on the Raspberry.

Load the model:

with open(MODEL_JSON, 'r') as f:model_data = f.read()model = tf.keras.models.model_from_json(model_data)model.load_weights(MODEL_H5)graph = tf.get_default_graph()

And classify the squat Mask:

img = cv.imread(path + f, cv.IMREAD_GRAYSCALE)img = np.reshape(img,[1,64,64,1])with graph.as_default():c = model.predict_classes(img)return c[0] if c else None

On the Raspberry, the input for the 64×64 classification call takes about 60-70 milliseconds, almost close to real-time.

Finally, let’s integrate all the above parts into one application:

• GET / — an application page (more information below)

• GET / status – get the current status, squat count, and frame count

• POST / start — start exercising

• POST / stop — complete the exercise

• GET / stream — video stream from the camera

If this article is helpful to friends, I hope you can give a “one-click three connections” at the end of the article.

Group Chat

Welcome to join the reader group of the public account to communicate with peers. Currently, there are WeChat groups for SLAM, 3D Vision, Sensors, Autonomous Driving, Computational Photography, Detection, Segmentation, Recognition, Medical Imaging, GAN, Algorithm Competitions, etc. (will gradually subdivide in the future), please scan the WeChat number below to join the group, note: “Nickname + School/Company + Research Direction”, for example: “Zhang San + Shanghai Jiao Tong University + Vision SLAM”. Please do not send advertisements in the group, otherwise you will be removed from the group. Thank you for your understanding~

Group Chat

Leave a Comment Cancel reply