Deep learning technology for videoframe processing in face segmentation on mobile devices

Main Article Content

Victoria M. Ruvinskaya
Yurii Yu. Timkov

Abstract

The aim of the research is to reduce the frame processing time for face segmentation on videos on mobile devices using deep learning technologies. The paper analyzes the advantages and disadvantages of existing segmentation methods, as well as their applicability to various tasks. The existing real-time realizations of face segmentation in the most popular mobile applications, which provide the functionality for adding visual effects to videos, were compared. As a result, it was determined that the classical segmentation methods do not have a suitable combination of accuracy and speed, and require manual tuning for a particular task, while the neural network-based segmentation methods determine the deep features automatically and have high accuracy with an acceptable speed. The method based on convolutional neural networks is chosen for use because, in addition to the advantages of other methods based on neural networks, it does not require such a significant amount of computing resources during its execution. A review of existing convolutional neural networks for segmentation was held, based on which the DeepLabV3+ network was chosen as having sufficiently high accuracy and being optimized for work on mobile devices. Modifications were made to the structure of the selected network to match the task of two classes segmentation and to speed up the work on devices with low performance. 8-bit quantization was applied to the values processed by the network for further acceleration. The network was adapted to the task of face segmentation by transfer learning performed on a set of face images from the COCO dataset. Based on the modified and additionally trained segmentation model, a mobile app was created to record video with real-time visual effects, which applies segmentation to separately add effects on two zones - the face (color filters, brightness adjustment, animated effects) and the background (blurring, hiding, replacement with another image). The time of frames processing in the application was tested on mobile devices with different technical characteristics. We analyzed the differences in testing results for segmentation using the obtained model and segmentation using the normalized cuts method. The comparison reveals a decrease of frame processing time on the majority of devices with a slight decrease of segmentation accuracy.

Downloads

Download data is not yet available.

Article Details

Topics

Section

Information technologies in energy systems engineering and manufacturing

Authors

Author Biographies

Victoria M. Ruvinskaya, Odessa National Polytechnic University, Shevchenko Avenue, 1, Odessa, 65044, Ukraine

PhD (Eng), Professor of the Department of Sysеm Software, Institute of Computer Systems

Yurii Yu. Timkov, Odessa National Polytechnic University, Shevchenko Avenue, 1, Odessa, 65044, Ukraine

Student of the System Software Department, Institute of Computer Systems