Face Mask Detection
Table of Contents
Project Overview
The Data
Neural Network Models
Results
Future Directions
Project Overview
Over a year in and the COVID-19 pandemic is still as present as ever - and with that, wearing masks in public places has become the new normal. Despite much resistance, mask wearing is critical to keep the general population safe and minimize the spread of the virus.
The goal of this project was to build a neural network model which could be used to determine whether or not people are wearing face masks. While this model is built to take static images as input, the idea is that the applications of this model could be expanded and used to analyze surveillance camera footage, flagging people who are potentially violating mask wearing mandates.
The Data
The dataset for this project can be found on kaggle. It contains labeled training/testing/validation sets of images - each image contains one individual who is either wearing or not wearing a mask.
As you might notice, the images are pretty clean with the individual being clearly presented with minimal background noise in each of the images. Therefore, there was not much preprocessing required - the only step was to resize the images to satisfy the dimension requirements of the candidate models.
Neural Network Models
For this project, we compared 3 different candidate models with Adam Optimizers and the Cross Entropy Loss Function:
- Convolutional Neural Network (built from scratch with 4 convolution and 2 linear layers) with various epochs and learning rates. See this notebook for more details.
- Pretrained Mobilenet V2 with 5 epochs.
- Pretrained Resnet 18 with 5 epochs.
One of the benefits of using a pretrained model is that the pre-initialized weights allow for faster convergence.
Results
We used accuracy as the evaluation metric when comparing the models and their performances when trained on the training set and validated on the validation set. The table below summarizes the results.
Model | Training Acc | Validation Acc |
---|---|---|
CNN | 0.900 | 0.899 |
Mobilenet V2 | 0.980 | 0.950 |
Resnet18 | 0.972 | 0.948 |
The Convolutional Neural Network model struggled to keep up with the other 2 models - it took training for 20 epochs to attain the 0.900/0.899 accuracies which still falls short of the other 2 models. We decided to throw this model out from our candidate model pool in order to save on computational power.
We chose the Mobilenet as our final model as it achieved the highest accuracy on the training and validation data. To evaluate the final model, we retrained the model using a combined dataset comprised of both the training and validation dataset and evaluated it on the testing dataset which contained a total of 100 images - 50 mask wearers and 50-non mask wearers. After training for 3 epochs, we got a final accuracy of 98%, with only 2 non-mask images being incorrectly classified.
Future Directions
This dataset was very clean and contained fairly straightforward images with the individuals being facing forward with minimal background distractions. This likely contributed to our high accuracies. In future iterations of this project, we would like to develop our model on more diverse data that consists of more noisy images to allow our model to be more robust and uphold its high accuracy on more realistic data. More diverse data would also give us an opportunity to explore the weaknesses in the model.
With further improvements, this model could be very helpful in the public health sectors. This model could be used to categorize mask wearing tendancies of different locations, which could have endless applications such as proving the efficacy of mask wearing in preventing the spread of the virus leading to a safer and healthier world for us all.