Author: Jason Brownlee
Face recognition is a computer vision task of identifying and verifying a person based on a photograph of their face.
Recently, deep learning convolutional neural networks have surpassed classical methods and are achieving state-of-the-art results on standard face recognition datasets. One example of a state-of-the-art model is the VGGFace and VGGFace2 model developed by researchers at the Visual Geometry Group at Oxford.
Although the model can be challenging to implement and resource intensive to train, it can be easily used in standard deep learning libraries such as Keras through the use of freely available pre-trained models and third-party open source libraries.
In this tutorial, you will discover how to develop face recognition systems for face identification and verification using the VGGFace2 deep learning model.
After completing this tutorial, you will know:
- About the VGGFace and VGGFace2 models for face recognition and how to install the keras_vggface library to make use of these models in Python with Keras.
- How to develop a face identification system to predict the name of celebrities in given photographs.
- How to develop a face verification system to confirm the identity of a person given a photograph of their face.
Let’s get started.
Tutorial Overview
This tutorial is divided into six parts; they are:
- Face Recognition
- VGGFace and VGGFace2 Models
- How to Install the keras-vggface Library
- How to Detect Faces for Face Recognition
- How to Perform Face Identification With VGGFace2
- How to Perform Face Verification With VGGFace2
Face Recognition
Face recognition is the general task of identifying and verifying people from photographs of their face.
The 2011 book on face recognition titled “Handbook of Face Recognition” describes two main modes for face recognition, as:
- Face Verification. A one-to-one mapping of a given face against a known identity (e.g. is this the person?).
- Face Identification. A one-to-many mapping for a given face against a database of known faces (e.g. who is this person?).
A face recognition system is expected to identify faces present in images and videos automatically. It can operate in either or both of two modes: (1) face verification (or authentication), and (2) face identification (or recognition).
— Page 1, Handbook of Face Recognition. 2011.
We will explore both of these face recognition tasks in this tutorial.
Want Results with Deep Learning for Computer Vision?
Take my free 7-day email crash course now (with sample code).
Click to sign-up and also get a free PDF Ebook version of the course.
VGGFace and VGGFace2 Models
The VGGFace refers to a series of models developed for face recognition and demonstrated on benchmark computer vision datasets by members of the Visual Geometry Group (VGG) at the University of Oxford.
There are two main VGG models for face recognition at the time of writing; they are VGGFace and VGGFace2. Let’s take a closer look at each in turn.
VGGFace Model
The VGGFace model, named later, was described by Omkar Parkhi in the 2015 paper titled “Deep Face Recognition.”
A contribution of the paper was a description of how to develop a very large training dataset, required to train modern-convolutional-neural-network-based face recognition systems, to compete with the large datasets used to train models at Facebook and Google.
… [we] propose a procedure to create a reasonably large face dataset whilst requiring only a limited amount of person-power for annotation. To this end we propose a method for collecting face data using knowledge sources available on the web (Section 3). We employ this procedure to build a dataset with over two million faces, and will make this freely available to the research community.
— Deep Face Recognition, 2015.
This dataset is then used as the basis for developing deep CNNs for face recognition tasks such as face identification and verification. Specifically, models are trained on the very large dataset, then evaluated on benchmark face recognition datasets, demonstrating that the model is effective at generating generalized features from faces.
They describe the process of training a face classifier first that uses a softmax activation function in the output layer to classify faces as people. This layer is then removed so that the output of the network is a vector feature representation of the face, called a face embedding. The model is then further trained, via fine-tuning, in order that the Euclidean distance between vectors generated for the same identity are made smaller and the vectors generated for different identities is made larger. This is achieved using a triplet loss function.
Triplet-loss training aims at learning score vectors that perform well in the final application, i.e. identity verification by comparing face descriptors in Euclidean space. […] A triplet (a, p, n) contains an anchor face image as well as a positive p != a and negative n examples of the anchor’s identity. The projection W’ is learned on target datasets
— Deep Face Recognition, 2015.
A deep convolutional neural network architecture is used in the VGG style, with blocks of convolutional layers with small kernels and ReLU activations followed by max pooling layers, and the use of fully connected layers in the classifier end of the network.
VGGFace2 Model
Qiong Cao, et al. from the VGG describe a follow-up work in their 2017 paper titled “VGGFace2: A dataset for recognizing faces across pose and age.”
They describe VGGFace2 as a much larger dataset that they have collected for the intent of training and evaluating more effective face recognition models.
In this paper, we introduce a new large-scale face dataset named VGGFace2. The dataset contains 3.31 million images of 9131 subjects, with an average of 362.6 images for each subject. Images are downloaded from Google Image Search and have large variations in pose, age, illumination, ethnicity and profession (e.g. actors, athletes, politicians).
— VGGFace2: A dataset for recognising faces across pose and age, 2017.
The paper focuses on how this dataset was collected, curated, and how images were prepared prior to modeling. Nevertheless, VGGFace2 has become the name to refer to the pre-trained models that have provided for face recognition, trained on this dataset.
Models are trained on the dataset, specifically a ResNet-50 and a SqueezeNet-ResNet-50 model (called SE-ResNet-50 or SENet), and it is variations of these models that have been made available by the authors, along with the associated code. The models are evaluated on standard face recognition datasets, demonstrating then state-of-the-art performance.
… we demonstrate that deep models (ResNet-50 and SENet) trained on VGGFace2, achieve state-of-the-art performance on […] benchmarks.
— VGGFace2: A dataset for recognising faces across pose and age, 2017.
Specifically, the SqueezeNet-based model offers better performance in general.
The comparison between ResNet-50 and SENet both learned from scratch reveals that SENet has a consistently superior performance on both verification and identification. […] In addition, the performance of SENet can be further improved by training on the two datasets VGGFace2 and MS1M, exploiting the different advantages that each offer.
— VGGFace2: A dataset for recognising faces across pose and age, 2017.
A face embedding is predicted by a given model as a 2,048 length vector. The length of the vector is then normalized, e.g. to a length of 1 or unit norm using the L2 vector norm (Euclidean distance from the origin). This is referred to as the ‘face descriptor‘. The distance between face descriptors (or groups of face descriptors called a ‘subject template’) is calculated using the Cosine similarity.
The face descriptor is extracted from from the layer adjacent to the classifier layer. This leads to a 2048 dimensional descriptor, which is then L2 normalized
— VGGFace2: A dataset for recognising faces across pose and age, 2017.
How to Install the keras-vggface Library
The authors of VGFFace2 provide the source code for their models, as well as pre-trained models that can be downloaded with standard deep learning frameworks such as Caffe and PyTorch, although there are not examples for TensorFlow or Keras.
We could convert the provided models to TensorFlow or Keras format and develop a model definition in order to load and use these pre-trained models. Thankfully, this work has already been done and can be used directly by third-party projects and libraries.
Perhaps the best-of-breed third-party library for using the VGGFace2 (and VGGFace) models in Keras is the keras-vggface project and library by Refik Can Malli.
Given that this is a third-party open-source project and subject to change, I have created a fork of the project here.
This library can be installed via pip; for example:
sudo pip install git+https://github.com/rcmalli/keras-vggface.git
After successful installation, you should then see a message like the following:
Successfully installed keras-vggface-0.5
You can confirm that the library was installed correctly by querying the installed package:
pip show keras-vggface
This will summarize the details of the package; for example:
Name: keras-vggface Version: 0.5 Summary: VGGFace implementation with Keras framework Home-page: https://github.com/rcmalli/keras-vggface Author: Refik Can MALLI Author-email: mallir@itu.edu.tr License: MIT Location: ... Requires: numpy, scipy, h5py, pillow, keras, six, pyyaml Required-by:
You can also confirm that the library loads correctly by loading it in a script and printing the current version; for example:
# check version of keras_vggface import keras_vggface # print version print(keras_vggface.__version__)
Running the example will load the library and print the current version.
0.5
How to Detect Faces for Face Recognition
Before we can perform face recognition, we need to detect faces.
Face detection is the process of automatically locating faces in a photograph and localizing them by drawing a bounding box around their extent.
In this tutorial, we will also use the Multi-Task Cascaded Convolutional Neural Network, or MTCNN, for face detection, e.g. finding and extracting faces from photos. This is a state-of-the-art deep learning model for face detection, described in the 2016 paper titled “Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks.”
We will use the implementation provided by Iván de Paz Centeno in the ipazc/mtcnn project. This can also be installed via pip as follows:
sudo pip install mtcnn
We can confirm that the library was installed correctly by importing the library and printing the version; for example.
# confirm mtcnn was installed correctly import mtcnn # print version print(mtcnn.__version__)
Running the example prints the current version of the library.
0.0.8
We can use the mtcnn library to create a face detector and extract faces for our use with the VGGFace face detector models in subsequent sections.
The first step is to load an image as a NumPy array, which we can achieve using the Matplotlib imread() function.
# load image from file pixels = pyplot.imread(filename)
Next, we can create an MTCNN face detector class and use it to detect all faces in the loaded photograph.
# create the detector, using default weights detector = MTCNN() # detect faces in the image results = detector.detect_faces(pixels)
The result is a list of bounding boxes, where each bounding box defines a lower-left-corner of the bounding box, as well as the width and height.
If we assume there is only one face in the photo for our experiments, we can determine the pixel coordinates of the bounding box as follows.
# extract the bounding box from the first face x1, y1, width, height = results[0]['box'] x2, y2 = x1 + width, y1 + height
We can use these coordinates to extract the face.
# extract the face face = pixels[y1:y2, x1:x2]
We can then use the PIL library to resize this small image of the face to the required size; specifically, the model expects square input faces with the shape 224×224.
# resize pixels to the model size image = Image.fromarray(face) image = image.resize((224, 224)) face_array = asarray(image)
Tying all of this together, the function extract_face() will load a photograph from the loaded filename and return the extracted face.
It assumes that the photo contains one face and will return the first face detected.
# extract a single face from a given photograph def extract_face(filename, required_size=(224, 224)): # load image from file pixels = pyplot.imread(filename) # create the detector, using default weights detector = MTCNN() # detect faces in the image results = detector.detect_faces(pixels) # extract the bounding box from the first face x1, y1, width, height = results[0]['box'] x2, y2 = x1 + width, y1 + height # extract the face face = pixels[y1:y2, x1:x2] # resize pixels to the model size image = Image.fromarray(face) image = image.resize(required_size) face_array = asarray(image) return face_array
We can test this function with a photograph.
Download a photograph of Sharon Stone taken in 2013 from Wikipedia released under a permissive license.
Download the photograph and place it in your current working directory with the filename ‘sharon_stone1.jpg‘.
The complete example of loading the photograph of Sharon Stone, extracting the face, and plotting the result is listed below.
# example of face detection with mtcnn from matplotlib import pyplot from PIL import Image from numpy import asarray from mtcnn.mtcnn import MTCNN # extract a single face from a given photograph def extract_face(filename, required_size=(224, 224)): # load image from file pixels = pyplot.imread(filename) # create the detector, using default weights detector = MTCNN() # detect faces in the image results = detector.detect_faces(pixels) # extract the bounding box from the first face x1, y1, width, height = results[0]['box'] x2, y2 = x1 + width, y1 + height # extract the face face = pixels[y1:y2, x1:x2] # resize pixels to the model size image = Image.fromarray(face) image = image.resize(required_size) face_array = asarray(image) return face_array # load the photo and extract the face pixels = extract_face('sharon_stone1.jpg') # plot the extracted face pyplot.imshow(pixels) # show the plot pyplot.show()
Running the example loads the photograph, extracts the face, and plots the result.
We can see that the face was correctly detected and extracted.
The results suggest that we can use the developed extract_face() function as the basis for examples with the VGGFace face recognition model in subsequent sections.
How to Perform Face Identification With VGGFace2
In this section, we will use the VGGFace2 model to perform face recognition with photographs of celebrities from Wikipedia.
A VGGFace model can be created using the VGGFace() constructor and specifying the type of model to create via the ‘model‘ argument.
model = VGGFace(model='...')
The keras-vggface library provides three pre-trained VGGModels, a VGGFace1 model via model=’vgg16′ (the default), and two VGGFace2 models ‘resnet50‘ and ‘senet50‘.
The example below creates a ‘resnet50‘ VGGFace2 model and summarizes the shape of the inputs and outputs.
# example of creating a face embedding from keras_vggface.vggface import VGGFace # create a vggface2 model model = VGGFace(model='resnet50') # summarize input and output shape print('Inputs: %s' % model.inputs) print('Outputs: %s' % model.outputs)
The first time that a model is created, the library will download the model weights and save them in the ./keras/models/vggface/ directory in your home directory. The size of the weights for the resnet50 model is about 158 megabytes, so the download may take a few minutes depending on the speed of your internet connection.
Running the example prints the shape of the input and output tensors of the model.
We can see that the model expects input color images of faces with the shape of 244×244 and the output will be a class prediction of 8,631 people. This makes sense given that the pre-trained models were trained on 8,631 identities in the MS-Celeb-1M dataset (listed in this CSV file).
Inputs: [] Outputs: [ ]
This Keras model can be used directly to predict the probability of a given face belonging to one or more of more than eight thousand known celebrities; for example:
# perform prediction yhat = model.predict(samples)
Once a prediction is made, the class integers can be mapped to the names of the celebrities, and the top five names with the highest probability can be retrieved.
This behavior is provided by the decode_predictions() function in the keras-vggface library.
# convert prediction into names results = decode_predictions(yhat) # display most likely results for result in results[0]: print('%s: %.3f%%' % (result[0], result[1]*100))
Before we can make a prediction with a face, the pixel values must be scaled in the same way that data was prepared when the VGGFace model was fit. Specifically, the pixel values must be centered on each channel using the mean from the training dataset.
This can be achieved using the preprocess_input() function provided in the keras-vggface library and specifying the ‘version=2‘ so that the images are scaled using the mean values used to train the VGGFace2 models instead of the VGGFace1 models (the default).
# convert one face into samples pixels = pixels.astype('float32') samples = expand_dims(pixels, axis=0) # prepare the face for the model, e.g. center pixels samples = preprocess_input(samples, version=2)
We can tie all of this together and predict the identity of our Shannon Stone photograph downloaded in the previous section, specifically ‘sharon_stone1.jpg‘.
The complete example is listed below.
# Example of face detection with a vggface2 model from numpy import expand_dims from matplotlib import pyplot from PIL import Image from numpy import asarray from mtcnn.mtcnn import MTCNN from keras_vggface.vggface import VGGFace from keras_vggface.utils import preprocess_input from keras_vggface.utils import decode_predictions # extract a single face from a given photograph def extract_face(filename, required_size=(224, 224)): # load image from file pixels = pyplot.imread(filename) # create the detector, using default weights detector = MTCNN() # detect faces in the image results = detector.detect_faces(pixels) # extract the bounding box from the first face x1, y1, width, height = results[0]['box'] x2, y2 = x1 + width, y1 + height # extract the face face = pixels[y1:y2, x1:x2] # resize pixels to the model size image = Image.fromarray(face) image = image.resize(required_size) face_array = asarray(image) return face_array # load the photo and extract the face pixels = extract_face('sharon_stone1.jpg') # convert one face into samples pixels = pixels.astype('float32') samples = expand_dims(pixels, axis=0) # prepare the face for the model, e.g. center pixels samples = preprocess_input(samples, version=2) # create a vggface model model = VGGFace(model='resnet50') # perform prediction yhat = model.predict(samples) # convert prediction into names results = decode_predictions(yhat) # display most likely results for result in results[0]: print('%s: %.3f%%' % (result[0], result[1]*100))
Running the example loads the photograph, extracts the single face that we know was present, and then predicts the identity for the face.
The top five highest probability names are then displayed.
We can see that the model correctly identifies the face as belonging to Sharon Stone with a likelihood of 99.642%.
b' Sharon_Stone': 99.642% b' Noelle_Reno': 0.085% b' Elisabeth_Rxc3xb6hm': 0.033% b' Anita_Lipnicka': 0.026% b' Tina_Maze': 0.019%
We can test the model with another celebrity, in this case, a male, Channing Tatum.
A photograph of Channing Tatum taken in 2017 is available on Wikipedia under a permissive license.
Download the photograph and save it in your current working directory with the filename ‘channing_tatum.jpg‘.
Change the code to load the photograph of Channing Tatum instead; for example:
pixels = extract_face('channing_tatum.jpg')
Running the example with the new photograph, we can see that the model correctly identifies the face as belonging to Channing Tatum with a likelihood of 94.432%.
b' Channing_Tatum': 94.432% b' Eoghan_Quigg': 0.146% b' Les_Miles': 0.113% b' Ibrahim_Afellay': 0.072% b' Tovah_Feldshuh': 0.070%
You might like to try this example with other photographs of celebrities taken from Wikipedia. Try a diverse set of genders, races, and ages. You will discover that the model is not perfect, but for those celebrities that it does know well, it can be effective.
You might like to try other versions of the model, such as ‘vgg16‘ and ‘senet50‘, then compare results. For example, I found that with a photograph of Oscar Isaac, that the ‘vgg16‘ is effective, but the VGGFace2 models are not.
The model could be used to identify new faces. One approach would be to re-train the model, perhaps just the classifier part of the model, with a new face dataset.
How to Perform Face Verification With VGGFace2
A VGGFace2 model can be used for face verification.
This involves calculating a face embedding for a new given face and comparing the embedding to the embedding for the single example of the face known to the system.
A face embedding is a vector that represents the features extracted from the face. This can then be compared with the vectors generated for other faces. For example, another vector that is close (by some measure) may be the same person, whereas another vector that is far (by some measure) may be a different person.
Typical measures such as Euclidean distance and Cosine distance are calculated between two embeddings and faces are said to match or verify if the distance is below a predefined threshold, often tuned for a specific dataset or application.
First, we can load the VGGFace model without the classifier by setting the ‘include_top‘ argument to ‘False‘, specifying the shape of the output via the ‘input_shape‘ and setting ‘pooling‘ to ‘avg‘ so that the filter maps at the output end of the model are reduced to a vector using global average pooling.
# create a vggface model model = VGGFace(model='resnet50', include_top=False, input_shape=(224, 224, 3), pooling='avg')
This model can then be used to make a prediction, which will return a face embedding for one or more faces provided as input.
# perform prediction yhat = model.predict(samples)
We can define a new function that, given a list of filenames for photos containing a face, will extract one face from each photo via the extract_face() function developed in a prior section, pre-processing is required for inputs to the VGGFace2 model and can be achieved by calling preprocess_input(), then predict a face embedding for each.
The get_embeddings() function below implements this, returning an array containing an embedding for one face for each provided photograph filename.
# extract faces and calculate face embeddings for a list of photo files def get_embeddings(filenames): # extract faces faces = [extract_face(f) for f in filenames] # convert into an array of samples samples = asarray(faces, 'float32') # prepare the face for the model, e.g. center pixels samples = preprocess_input(samples, version=2) # create a vggface model model = VGGFace(model='resnet50', include_top=False, input_shape=(224, 224, 3), pooling='avg') # perform prediction yhat = model.predict(samples) return yhat
We can take our photograph of Sharon Stone used previously (e.g. sharon_stone1.jpg) as our definition of the identity of Sharon Stone by calculating and storing the face embedding for the face in that photograph.
We can then calculate embeddings for faces in other photographs of Sharon Stone and test whether we can effectively verify her identity. We can also use faces from photographs of other people to confirm that they are not verified as Sharon Stone.
Verification can be performed by calculating the Cosine distance between the embedding for the known identity and the embeddings of candidate faces. This can be achieved using the cosine() SciPy function. The maximum distance between two embeddings is a score of 1.0, whereas the minimum distance is 0.0. A common cut-off value used for face identity is between 0.4 and 0.6, such as 0.5, although this should be tuned for an application.
The is_match() function below implements this, calculating the distance between two embeddings and interpreting the result.
# determine if a candidate face is a match for a known face def is_match(known_embedding, candidate_embedding, thresh=0.5): # calculate distance between embeddings score = cosine(known_embedding, candidate_embedding) if score <= thresh: print('>face is a Match (%.3f <= %.3f)' % (score, thresh)) else: print('>face is NOT a Match (%.3f > %.3f)' % (score, thresh))
We can test out some positive examples by downloading more photos of Sharon Stone from Wikipedia.
Specifically, a photograph taken in 2002 (download and save as ‘sharon_stone2.jpg‘), and a photograph taken in 2017 (download and save as ‘sharon_stone3.jpg‘)
We will test these two positive cases and the Channing Tatum photo from the previous section as a negative example.
The complete code example of face verification is listed below.
# face verification with the VGGFace2 model from matplotlib import pyplot from PIL import Image from numpy import asarray from scipy.spatial.distance import cosine from mtcnn.mtcnn import MTCNN from keras_vggface.vggface import VGGFace from keras_vggface.utils import preprocess_input # extract a single face from a given photograph def extract_face(filename, required_size=(224, 224)): # load image from file pixels = pyplot.imread(filename) # create the detector, using default weights detector = MTCNN() # detect faces in the image results = detector.detect_faces(pixels) # extract the bounding box from the first face x1, y1, width, height = results[0]['box'] x2, y2 = x1 + width, y1 + height # extract the face face = pixels[y1:y2, x1:x2] # resize pixels to the model size image = Image.fromarray(face) image = image.resize(required_size) face_array = asarray(image) return face_array # extract faces and calculate face embeddings for a list of photo files def get_embeddings(filenames): # extract faces faces = [extract_face(f) for f in filenames] # convert into an array of samples samples = asarray(faces, 'float32') # prepare the face for the model, e.g. center pixels samples = preprocess_input(samples, version=2) # create a vggface model model = VGGFace(model='resnet50', include_top=False, input_shape=(224, 224, 3), pooling='avg') # perform prediction yhat = model.predict(samples) return yhat # determine if a candidate face is a match for a known face def is_match(known_embedding, candidate_embedding, thresh=0.5): # calculate distance between embeddings score = cosine(known_embedding, candidate_embedding) if score <= thresh: print('>face is a Match (%.3f <= %.3f)' % (score, thresh)) else: print('>face is NOT a Match (%.3f > %.3f)' % (score, thresh)) # define filenames filenames = ['sharon_stone1.jpg', 'sharon_stone2.jpg', 'sharon_stone3.jpg', 'channing_tatum.jpg'] # get embeddings file filenames embeddings = get_embeddings(filenames) # define sharon stone sharon_id = embeddings[0] # verify known photos of sharon print('Positive Tests') is_match(embeddings[0], embeddings[1]) is_match(embeddings[0], embeddings[2]) # verify known photos of other people print('Negative Tests') is_match(embeddings[0], embeddings[3])
The first photo is taken as the template for Sharon Stone and the remaining photos in the list are positive and negative photos to test for verification.
Running the example, we can see that the system correctly verified the two positive cases given photos of Sharon Stone both earlier and later in time.
We can also see that the photo of Channing Tatum is correctly not verified as Sharon Stone. It would be an interesting extension to explore the verification of other negative photos, such as photos of other female celebrities.
Positive Tests >face is a Match (0.418 <= 0.500) >face is a Match (0.295 <= 0.500) Negative Tests >face is NOT a Match (0.709 > 0.500)
Note: the embeddings generated from the model are not specific to the photos of celebrities used to train the model. The model is believed to produce useful embeddings for any faces; perhaps try it out with photos of yourself compared to photos of relatives and friends.
Further Reading
This section provides more resources on the topic if you are looking to go deeper.
Papers
- Deep Face Recognition, 2015.
- VGGFace2: A dataset for recognising faces across pose and age, 2017.
- Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks, 2016.
Books
- Handbook of Face Recognition, Second Edition, 2011.
API
- Visual Geometry Group (VGG) Homepage.
- VGGFace Homepage.
- VGGFace2 Homepage.
- Official VGGFace2 Project, GitHub.
- keras-vggface Project, GitHub.
- MS-Celeb-1M Dataset Homepage.
- scipy.spatial.distance.cosine API
Summary
In this tutorial, you discovered how to develop face recognition systems for face identification and verification using the VGGFace2 deep learning model.
Specifically, you learned:
- About the VGGFace and VGGFace2 models for face recognition and how to install the keras_vggface library to make use of these models in Python with Keras.
- How to develop a face identification system to predict the name of celebrities in given photographs.
- How to develop a face verification system to confirm the identity of a person given a photograph of their face.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
The post How to Perform Face Recognition With VGGFace2 in Keras appeared first on Machine Learning Mastery.