How to do Facial Emotion Recognition Using A CNN? (2023)

How to do Facial Emotion Recognition Using A CNN? (1)

Hola everyone!

It’s been a long time. So let me start by giving a recap of what I was doing during this time. I moved to the beautiful town of Boulder, Colorado, USA to pursue a Master of Science (MS) degree from the University of Colorado Boulder!

How to do Facial Emotion Recognition Using A CNN? (2)

I started working on a project involving human-robot interaction at Collaborative AI and Robotics Laboratory at my University and have met lot’s of amazing people in a very short time!

Alright, enough of that, let’s get to the task at hand. This is a post meant hopefully answer the question-

How to do Facial Emotion Recognition Using a Convolution Neural Network?

Before we start with the specifics, let’s start with some basics!

What the f is a convolution neural network?

(Video) face emotion detection using CNN and OpenCV(Deep Learning)

Right now, all you need to know that a Convolution Neural Network or CNN as it is popularly called is a collection of two types of layers-

  1. The hidden layers / Feature Extraction Part
  • convolutions
  • pooling

2. The classifier part

Alright, but what the f is a convolution?

Convolution is a mathematical operation which involves a combination of two functions to produce a third function. In CNN the convolution is performed on the input data with the use of a filter to produce a feature map.

But, you mentioned something called pooling too?

Pooling layer is added after a convolution layer. It performs continuous dimensionality reduction i.e reduces the number of parameters and computations thereby shortening training time and controlling overfitting. One such pooling technique is called max-pooling, which takes the maximum value in each window which decreases the feature map size while keeping the significant information.

How to do Facial Emotion Recognition Using A CNN? (4)

Now let’s move to the last thing we need to know before we get out hands dirty that is Dropout, a technique where randomly selected neurons are ignored during the training. They are “dropped out” randomly. This is a great technique which is used to reduce overfitting in our model and to get well-generalized results.

Still confused?

Don’t you worry, just read this awesome post by Daphne Cornelisse and you will get the hang of things.

Now lets’s start coding!

(Video) Emotion Detection using CNN | Emotion Detection Deep Learning project |Machine Learning | Data Magic

We will be working with the Kaggle’s FER2013 dataset, which can be downloaded by clicking the link and the CSV file can be extracted.

I will follow a line by line approach so that it’s easier to understand. Let’s start with preprocessing. You can fork the repository for this code if you wish to follow along.

This is a fairly simple step which involves getting the data and storing it in a way that would be easier for us to use.

Line 1–7- Importing the libraries and reading the CSV file.

Line 8–3 - Getting the training features X and labels y from pixels and emotion columns of the CSV respectively and converting them into numpy arrays. We also add an additional dimension to our feature vector by using np.expand_dims() function, this is done to make the input suitable for our CNN which we will design later. Both features and labels are stored as .npy files to be used later.

After we execute the code above, our output would look something like this-

Preprocessing Done
Number of Features: 48
Number of Labels: 7
Number of examples in dataset:35887
X,y stored in fdataX.npy and flabels.npy respectively

Now let’s start developing our model. I will divide the process into multiple steps so that it’s not too overwhelming.

Line 1–11 - Importing the required libraries for our CNN.

Line 12 -23 - Okay there is a lot going on here, first we declare the variables we will need for training our CNN. We have 48x 48-pixel resolution so we have width and height as 48. Then we have 7 emotions that we are predicting namely (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral), so we have 7 labels. We will be processing our inputs with a batch size of 64.

Next, we load the features and labels into x and y respectively and standardized x by subtracting the means and dividing by the standard deviation.

Line 24 -35 - The first four lines just print the images by using the pixel values. After that we divide the data into training and testing set by using sklearn’s train_test_split() function and save the test features and labels to be used later. We are also performing another division on our training data to obtain the validation data which would be used later in the code.

Now let’s move to the next chunk of code.

(Video) Data 602 - Emotion Recognition Using Facial Expression Based on CNN Model | UMBC (Final Project PPT)

This step is the most important part of the entire process as we design the CNN through which we will pass our features to train the model and eventually test it using the test features. We have used a combination of several different functions to construct CNN which we will discuss one by one.

  1. Sequential() - A sequential model is just a linear stack of layers which is putting layers on top of each other as we progress from the input layer to the output layer. You can read more about this here.
  2. model.add(Conv2D()) - This is a 2D Convolutional layer which performs the convolution operation as described at the beginning of this post. To quote Keras Documentation “ This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs.” Here we are using a 3x3 kernel size and Rectified Linear Unit (ReLU) as our activation function.
  3. model.add(BatchNormalization()) - It performs the batch normalization operation on inputs to the next layer so that we have our inputs in a specified scale say 0 to 1 instead of being scattered all over the place.
  4. model.add(MaxPooling2D()) - This function performs the pooling operation on the data as explained at the beginning of the post. We are taking a pooling window of 2x2 with 2x2 strides in this model. If you want to read more about MaxPooling you can refer the Keras Documentation or the post mentioned above.
  5. model.add(Dropout()) - As explained above Dropout is a technique where randomly selected neurons are ignored during the training. They are “dropped out” randomly. This reduces overfitting.
  6. model.add(Flatten()) - This just flattens the input from ND to 1D and does not affect the batch size.
  7. model.add(Dense()) - According to Keras Documentation, Dense implements the operation: output = activation(dot(input, kernel)where activationis the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer. In simple words, it is the final nail in the coffin which uses the features learned using the layers and maps it to the label. During testing, this layer is responsible for creating the final label for the image being processed.

After the model.summary() function is executed, the output looks something like this -

How to do Facial Emotion Recognition Using A CNN? (5)

On to the next chunk!

This is a fairly simple chunk of code where first the model is compiled with categorical_crossentropy as the loss function and using Adam optimizer. We are using accuracy as the metrics for validation.

Next, we are fitting the model with the fixed batch size (64 here), epochs (100 here) and validation data which we obtained by splitting the training data earlier. And finally, we are saving the model for some custom tests which I will explain later.

After we run the code above ( we will get an output which would look something like this -

Train on 29068 samples, validate on 3230 samples
Epoch 1/100
29068/29068 [==============================] — 34s 1ms/step — loss: 2.0047 — acc: 0.2124 — val_loss: 1.8123 — val_acc: 0.2817
Epoch 2/100
29068/29068 [==============================] — 31s 1ms/step — loss: 1.7918 — acc: 0.2692 — val_loss: 1.6796 — val_acc: 0.3195
Epoch 3/100
29068/29068 [==============================] — 31s 1ms/step — loss: 1.7021 — acc: 0.3148 — val_loss: 1.5516 — val_acc: 0.3957
Epoch 100/100
29068/29068 [==============================] — 31s 1ms/step — loss: 0.3083 — acc: 0.9049 — val_loss: 1.3855 — val_acc: 0.6666
Saved model to disk

We can see we got a validation accuracy of 66.6% which is quite good actually! Let’s go a step ahead and test the model on the testing data which we saved earlier by running the file. We will get an output like this-

Loaded model from diskAccuracy on test set :66.3694622458

This is an exciting result because the model which won the competition had 71.1% accuracy, which means this result puts us into 5th place! Isn’t that awesome!

Now I also created a confusion matrix to find out which emotions usually get confused with each other more often and it looked something like this-

How to do Facial Emotion Recognition Using A CNN? (6)
(Video) Emotion Detection using Convolutional Neural Networks and OpenCV | Keras | Realtime

See how Anger and Disgust were confused with each other as they are very similar negative emotions. Something similar happened with Fear and Sadness.

Building on this result I am dividing the emotions into 3 categories (Positive, Neutral and Negative) for my next project which involves giving facial emotion recognition capabilities to a robot during navigation!

You can generate your own confusion matrix by running the program from the repository.

To make things more fun, I tested the model on faces of the cast from a popular TV Series F.R.I.E.N.D.S and results were pretty good!

How to do Facial Emotion Recognition Using A CNN? (7)

Mind you, these are real predicted emotions. You can do the same on your custom test image or use this model in your own project by forking and cloning the repository and running file!

I think that wraps it up real good. It has been a great ride as always.


How to do Facial Emotion Recognition Using A CNN? (8)
How to do Facial Emotion Recognition Using A CNN? (9)
(Video) Train your own Neural Network for Facial expression recognition | TensorFlow, CNN, Keras, tutorial
How to do Facial Emotion Recognition Using A CNN? (10)


How facial emotion recognition is done using CNN methodology? ›

With a CNN, an input image is filtered through convolution layers to produce a feature map. This map is then input to fully connected layers, and the facial expression is recognized as belonging to a class based on the output of the FE classifier.

Can we use CNN for face recognition? ›

The construction and training of CNN model based on face recognition are studied. To simplify the CNN model, the convolution and sampling layers are combined into a single layer. Based on the already trained network, greatly improve the image recognition rate.

Which CNN model is used for face recognition? ›

VGG is a specific convolutional network designed for classification and localization and in which the characteristics of face image are extracted by convolution neural network VGGNet model, then the extracted feature dimensions are reduced by PCA, and finally face recognition is carried out by SVM classification method ...

What are the steps for facial emotion recognition? ›

Facial expression recognition is composed of three major steps: (1) Face detection and preprocessing of image, (2) Feature extraction and (3) Expression classification.

Which algorithm is best for emotion recognition? ›

For classification, we have used Support Vector Machine (SVM), Random Forest (RF), and Nearest Neighbor Algorithm (kNN). This attains emotion recognition and intensity estimation of each recognized emotion.

How CNN works step by step? ›

The pixels from the image are fed to the convolutional layer that performs the convolution operation. It results in a convolved map. The convolved map is applied to a ReLU function to generate a rectified feature map. The image is processed with multiple convolutions and ReLU layers for locating the features.

Why deep CNNs are not preferred for face recognition? ›

Pitfalls of my approach:

CNN's need a lot of pictures to work well. Many times CNN models use millions of pictures, of course this is not always necessary, but leads to better results. Imbalanced data can lead to over fitting which is not good.

How does CNN work for image recognition? ›

How Does CNN work? CNN's are equipped with an input layer, an output layer, and hidden layers, all of which help process and classify images. The hidden layers comprise convolutional layers, ReLU layers, pooling layers, and fully connected layers, all of which play a crucial role.

Which technique is best for face recognition? ›

PCA+CNN and SOM+CNN methods are both superior to eigenfaces technique even when there is only one training image per person. SOM+CNN method consistently performs better than the PCA+CNN method [8]. Fisherfaces: Fisherfaces is one the most successfully widely used method for face recognition.

Which algorithm is used for facial expression recognition? ›

In the process, the k-nearest neighbor algorithm is improved. Experimental results show that the performance of the proposed method is excellent when it is applied to facial expression recognition system.

Which classifier is best for face recognition? ›

A study has achieved good result when using Support Vector Machine (SVM) as a classifier to implement face recognition system [12], as assessed by PCA. The recognition rate achieved by this study was, 98.75% [12].

How is CNN used in human activity recognition? ›

CNNs can be applied to human activity recognition data. The CNN model learns to map a given window of signal data to an activity where the model reads across each window of data and prepares an internal representation of the window.

Which algorithm is used in facial emotion recognition? ›

In order to improve the reliability of facial expression recognition system, and reduce the chance of false positives caused by error, classification strategy is important in recognition process. In the process, the k-nearest neighbor algorithm is improved.

How does neural network work in face recognition? ›

In the face matching step, we apply a model combining many Neural Networks for matching geometric features of human face. The model links many Neural Networks together, so we call it Multi Artificial Neural Network. MIT + CMU database is used for evaluating our proposed methods for face detection and alignment.

How does a face detection program work using neural networks? ›

A retinally connected neural network examines small win- dows of an image, and decides whether each window con- tains a face. The system arbitrates between multiple net- works to improve performance over a single network.


1. Realtime Face Emotion Recognition | Tensorflow | Transfer Learning | Python | Train your own Images
2. Facial Emotion Recognition using CNN
(Dr. Ninad Mehendale)
3. Detect Emotions with Convolutional Neural Networks
(The Assembly)
(ANNAM MAHESH,CSE18 Vel Tech, Chennai)
5. Emotion Detection using CNN | Deep Learning | Retail Analytics Project
6. Face Recognition using CNN (GoogleNet)
(Nuruzzaman Faruqui)
Top Articles
Latest Posts
Article information

Author: Rubie Ullrich

Last Updated: 04/14/2023

Views: 6168

Rating: 4.1 / 5 (72 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Rubie Ullrich

Birthday: 1998-02-02

Address: 743 Stoltenberg Center, Genovevaville, NJ 59925-3119

Phone: +2202978377583

Job: Administration Engineer

Hobby: Surfing, Sailing, Listening to music, Web surfing, Kitesurfing, Geocaching, Backpacking

Introduction: My name is Rubie Ullrich, I am a enthusiastic, perfect, tender, vivacious, talented, famous, delightful person who loves writing and wants to share my knowledge and understanding with you.