Facial Recognition


Introduction, Research & Terminology

Research and References:

This project uses face recognition with OpenCV, Python, and deep learning in both images and video streams:


deep learning-based facial embeddings are both (1) highly accurate and (2) capable of being executed in real-time.

Face recognition with OpenCV, Python, and deep learning

Let’s start with a brief introduction of how deep learning-based facial recognition works. This project shares the libraries and code needed to actually perform face recognition, for both still images and video streams, capable of running in real-time.

How deep learning face recognition embeddings work in practice?

The secret is a technique called deep metric learning.

If you have any prior experience with deep learning you know that we typically train a network to:

  • Accept a single input image
  • And output a classification/label for that image

However, deep metric learning is different.

Instead, of trying to output a single label (or even the coordinates/bounding box of objects in an image), we are instead outputting a real-valued feature vector.

For the dlib facial recognition network, the output feature vector is 128-d (i.e., a list of 128 real-valued numbers) that is used to quantify the face. Training the network is done using triplets:


Figure 1: Facial recognition via deep metric learning involves a “triplet training step.” The triplet consists of 3 unique face images — 2 of the 3 are the same person. The NN generates a 128-d vector for each of the 3 face images. For the 2 face images of the same person, we tweak the neural network weights to make the vector closer via distance metric. Image credit: Adam Geitgey’s “Machine Learning is Fun” blog

Here we provide three images to the network:

  • Two of these images are example faces of the same person.
  • The third image is a random face from our dataset and is not the same person as the other two images.

As an example, let’s again consider Figure 1 above where we provided three images: one of Chad Smith and two of Will Ferrell.

Our network quantifies the faces, constructing the 128-d embedding (quantification) for each.

From there, the general idea is that we’ll tweak the weights of our neural network so that the 128-d measurements of the two Will Ferrel will be closer to each other and farther from the measurements for Chad Smith.

Our network architecture for face recognition is based on ResNet-34 from the Deep Residual Learning for Image Recognition paper by He et al., but with fewer layers and the number of filters reduced by half.

The network itself was trained by Davis King on a dataset of ~3 million images. On the Labeled Faces in the Wild (LFW) dataset the network compares to other state-of-the-art methods, reaching 99.38% accuracy.

Both Davis King (the creator of dlib) and Adam Geitgey (the author of the face_recognition module we’ll be using shortly) have written detailed articles on how deep learning-based facial recognition works:

I would highly encourage you to read the above articles for more details on how deep learning facial embeddings work.


Our Project

Our project will focus on using dlib features to provide accurate facial recognition.


The Data

This dataset was constructed in < 30 minutes using the method discussed in How to (quickly) build a deep learning image dataset tutorial. Given this dataset of images:

  • Create the 128-d embeddings for each face in the dataset
  • Use these embeddings to recognize the faces of the characters in both images and video streams



Hardware list:

  • Raspberry pi 3b+
  • Camera
  • 3.5 inch screen


Required libraries:

  • imutils 
  • face_recognition
  • argparse
  • pickle
  • cv2
  • os






Conclusion and Next steps