Automatic Person Detection and Blurring in Videos Using Mask R-CNN

May 2, 2023

There’s a dog park near my apartment, but unfortunately, it’s on the other side of my building. After work, it gets super popular and is truly a sight to behold. I thought that the world deserves to see this little slice of dog heaven. However, one obstacle to that desire is the privacy of the people around the dogs. To overcome this challenge, I explored using pre-trained neural networks to automatically blur people within a video. Currently, the code is slow, taking nine minutes to process a 14-second video, but I have some ideas to improve this project. Please see my current results below:

Before

Original Video Source

After

Background on Object Detection

Blurring out people in a video requires object detection, a subfield of computer vision that uses AI and machine learning algorithms to identify objects in images or videos. The most popular technique used in object detection is convolutional neural networks (CNNs), a type of deep neural network that can process image data with a high degree of accuracy. Researchers have developed various CNN architectures for video processing, including Faster R-CNN, YOLO, and Mask R-CNN, which can detect objects in real-time.

Using Pre-trained Models

For this project, I chose to use a Mask R-CNN model that was pre-trained by TensorFlow. Using a pre-trained model saved time and resources compared to building a model from scratch. Pre-trained models have already learned to recognize a wide variety of objects, and can often achieve high levels of accuracy, making them a good choice for many applications.

The Code

I created a python script that performs video processing and inference on Google Colaboratory, which allows me to leverage cloud GPU resources. Using Colaboratory gives you access to a Nvidia Tesla GPU with 15 GB of memory, which is significantly more horsepower than my computer has! Currently, the code is not well optimized and consumes significant computing resources. Here is a link to the code in notebook form: person-detect-blur.ipynb

In the notebook are two main functions, runInferenceOverVideo() and blurFullVideo(). runInferenceOverVideo() only runs over 10 random frames of the input video and outputs an animated gif directly embedded in the notebook. blurFullVideo() will run over the whole input video, and output to a .mp4 file. In order to run this script on the entire video, I used openCV to load, process, and output a frame at a time with a custom FrameReader and FrameWriter class.

Next Steps

I have some ideas to optimize this process, including exploring neural network accelerator hardware and turning it into an app. I believe that with further optimization, it would be possible to apply this application to live video from the user’s phone. The Mask R-CNN model is not optimal for this use case, as it is capable of recognizing and masking many types of objects. TensorFlow provides tools to optimize models, which can help remove unnecessary bloat from the models and greatly speed up their operation.

In conclusion, using pre-trained neural networks is an effective way to achieve automatic person detection and blurring in videos. With further optimization and refinement, this technology has the potential to be applied to a wide range of applications, including live video feeds.