Real-Time Face Recognition with OpenCV and TensorFlow

Click on the above“Beginner’s Guide to Vision” to selectStar or “Pin”

Important content delivered promptly

This article is adapted from | OpenCV Academy

Introduction to FaceNet Network
FaceNet is a face recognition model proposed by Google, and its biggest difference from other face recognition models is that it does not output an intermediate layer, but directly embeds and generates face features in a low-dimensional Euclidean space. This is very convenient for various recognition, classification, and similarity comparison in the future. Compared to other deep learning methods, FaceNet only needs 128 bytes to represent a face. The design goal tasks of the FaceNet network are as follows:
1. Verification - Check if it is the same face
2. Recognition - Identify if it is the same person
3. Clustering - Discover people with similar face characteristics

The explanation of what neural network embedding is quite complex, simply put, embedding learning in neural networks helps us represent discrete variables as continuous vectors, finding nearest neighbors in low-dimensional space. The word2vec in TensorFlow uses embedding. Once the embedding neural network is trained, the target tasks become very simple:
Face verification - becomes a similarity comparison problem
Face recognition - becomes a KNN classification problem
Face clustering - can be completed through K-Means.

Compared to other face recognition networks, which achieve face recognition by adding classification layers, the face feature data output before the classification layer can have thousands of dimensions, and cannot well express a new face. FaceNet uses a triplet loss function based on LMNN to train and output a 128-dimensional continuous vector. The triplet loss includes two matching face fingerprints and one non-matching face fingerprint, and the objective of the loss function training is to ensure that the distance margin between matching and non-matching face fingerprints is sufficiently large. The face fingerprints come from the facial region image, which has not been aligned and only undergoes simple geometric transformations. The entire network architecture and triplet representation are as follows:

The selection of positive and negative samples requires selecting hard samples to obtain triplets, which is beneficial for better training effects and model convergence.

Code Implementation: We borrowed a TensorFlow version of FaceNet implementation from GitHub, downloaded the pre-trained FaceNet model, and based on the embedding data obtained, this will output 512-dimensional data for comparison to achieve face similarity calculation. The similarity calculation uses cosine similarity measurement, where a distance of 0 between two embedding data indicates complete consistency, a distance of 1 indicates complete difference. A value of 0.5 or 90 degrees indicates independence.

Download the pre-trained FaceNet model:
https://github.com/davidsandberg/facenet

The code for loading the model and sample data is as follows:
with tf.Graph().as_default():
    with tf.Session() as sess:
        # Load the model
        facenet.load_model("D:/python/facenet/data/20180408-102900")
        # Get input and output tensors
        images_placeholder = tf.get_default_graph().get_tensor_by_name("input:0")
        embeddings = tf.get_default_graph().get_tensor_by_name("embeddings:0")
        phase_train_placeholder = tf.get_default_graph().get_tensor_by_name("phase_train:0")
        # Run forward pass to calculate embeddings
        images, names = load_face_labels()
        num_labels = len(names)
        feed_dict = {images_placeholder: images, phase_train_placeholder: False}
        labels_embeddings = sess.run(embeddings, feed_dict=feed_dict)

The code for face comparison is as follows:
def face_compare(face_image):
    # Run forward pass to calculate embeddings
    feed_dict = {images_placeholder: face_image, phase_train_placeholder: False}
    embedding_result = sess.run(embeddings, feed_dict=feed_dict)
    minDist = 1000000
    label_index = -1
    for i in range(num_labels):
        # Cosine similarity
        dist = facenet.distance([labels_embeddings[i]], [embedding_result[0]], 1)
        if minDist > dist:
            label_index = i
            minDist = dist
    print("minDist: %1.4f"%minDist)
    if minDist > 0.25:
        return "unknown"
    if names[label_index] == 3:
        return "zhigang"
    if names[label_index] == 2:
        return "peter"
    if names[label_index] == 1:
        return "balvin"

OpenCV real-time camera call and output:
capture = cv.VideoCapture(0)
height = capture.get(cv.CAP_PROP_FRAME_HEIGHT)
width = capture.get(cv.CAP_PROP_FRAME_WIDTH)
out = cv.VideoWriter("D:/faces.mp4", cv.VideoWriter_fourcc('D', 'I', 'V', 'X'), 15,
                     (np.int(width), np.int(height)), True)
while True:
    ret, frame = capture.read()
    if ret is True:
        frame = cv.flip(frame, 1)
        cv.imshow("frame", frame)
        rgb = cv.cvtColor(frame, cv.COLOR_BGR2RGB)
        result, bb = mtcnn_align_demo(rgb)
        if result is not None:
            name = face_compare(result)
            cv.rectangle(frame, (bb[0], bb[1]), (bb[2], bb[3]), (0, 0, 255), 1, 8, 0)
            cv.putText(frame, name, (bb[0], bb[1]), cv.FONT_HERSHEY_SIMPLEX, 1.0, (0, 255, 0), 2, 8)
            cv.imshow("face recognition demo", frame)
            out.write(frame)
            cv.imwrite("D:/face_recognition.png", frame)
        c = cv.waitKey(10)
        if c == 27:
            break
    else:
        break
out.release()
capture.release()
cv.destroyAllWindows()

Face data collection is done through MTCNN for real-time face detection, collecting face data for each person to be compared, and labeling the data. The data collection code is as follows:
while True:
    ret, frame = capture.read()
    if ret is True:
        frame = cv.flip(frame, 1)
        cv.imshow("frame", frame)
        rgb = cv.cvtColor(frame, cv.COLOR_BGR2RGB)
        result = detection(rgb)

        out.write(result)
        c = cv.waitKey(10)
        if c == 27:
            break
    else:
        break
cv.destroyAllWindows()

For more details about MTCNN, see recommended reading.

Running Effect: Balvin and I

Download 1: OpenCV-Contrib Extension Module Chinese Tutorial

Reply in the backstage of the “Beginner’s Guide to Vision” public account:Extension Module Chinese Tutorial to download the first Chinese version of the OpenCV extension module tutorial on the internet, covering installation of extension modules, SFM algorithms, stereo vision, object tracking, biological vision, super-resolution processing and more than twenty chapters of content.

Download 2: 52 Lectures on Python Vision Practical Projects

Reply in the backstage of the “Beginner’s Guide to Vision” public account:Python Vision Practical Projects to download 31 vision practical projects including image segmentation, mask detection, lane line detection, vehicle counting, eyeliner addition, license plate recognition, character recognition, emotion detection, text content extraction, face recognition to assist in quickly learning computer vision.

Download 3: 20 Lectures on OpenCV Practical Projects

Reply in the backstage of the “Beginner’s Guide to Vision” public account:OpenCV Practical Projects 20 Lectures to download 20 practical projects based on OpenCV to achieve advanced learning of OpenCV.

Discussion Group

Welcome to join the public account reader group to communicate with peers. Currently, there are WeChat groups for SLAM, three-dimensional vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions and more (which will gradually be subdivided), please scan the WeChat ID below to join the group, note: “nickname + school/company + research direction”, for example: “Zhang San + Shanghai Jiao Tong University + Vision SLAM”. Please follow the format for notes, otherwise, you will not be allowed to join. After successful addition, invitations will be sent to relevant WeChat groups based on research direction. Please do not send advertisements in the group, otherwise, you will be removed from the group. Thank you for your understanding~

Leave a Comment Cancel reply